Intermediate scoring and rejection loopback for improved key phrase detection

US 9,972,313 B2
Filed: 03/01/2016
Issued: 05/15/2018
Est. Priority Date: 03/01/2016
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for key phrase detection comprising:

updating, at a current time instance, a start state based rejection model and a key phrase model associated with a predetermined key phrase based on scores of sub-phonetic units representative of received audio input, wherein the start state based rejection model includes a single rejection state having a plurality of rejection model self loops each associated with a particular score of the scores of sub-phonetic units, wherein the key phrase model includes a plurality of key phrase states interconnected by transitions therebetween, wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states, and wherein said updating comprises;

transitioning a score from a particular key phrase state of the plurality of key phrase states of the key phrase model to a next key phrase state of the plurality of key phrase states of the key phrase model;

transitioning the score from the particular key phrase state to the single rejection state of the start state based rejection model; and

generating a rejection likelihood score corresponding to the single rejection state of the start state based rejection model and a key phrase likelihood score corresponding to the key phrase model; and

detecting the predetermined key phrase in the received audio input based on the rejection likelihood score and the key phrase likelihood score; and

providing a wake indicator or a command in response to the detected predetermined key phrase.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include intermediate scoring of a state or states of a key phrase model and/or a backward transition or rejection loopback from a state of the key phrase model to a rejection model to reduce false accepts based on received utterances.

56 Citations

View as Search Results

24 Claims

1. A computer-implemented method for key phrase detection comprising:
- updating, at a current time instance, a start state based rejection model and a key phrase model associated with a predetermined key phrase based on scores of sub-phonetic units representative of received audio input, wherein the start state based rejection model includes a single rejection state having a plurality of rejection model self loops each associated with a particular score of the scores of sub-phonetic units, wherein the key phrase model includes a plurality of key phrase states interconnected by transitions therebetween, wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states, and wherein said updating comprises;
  
  transitioning a score from a particular key phrase state of the plurality of key phrase states of the key phrase model to a next key phrase state of the plurality of key phrase states of the key phrase model;
  
  transitioning the score from the particular key phrase state to the single rejection state of the start state based rejection model; and
  
  generating a rejection likelihood score corresponding to the single rejection state of the start state based rejection model and a key phrase likelihood score corresponding to the key phrase model; and
  
  detecting the predetermined key phrase in the received audio input based on the rejection likelihood score and the key phrase likelihood score; and
  
  providing a wake indicator or a command in response to the detected predetermined key phrase.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said updating comprises determining a highest probability score from the particular scores of sub-phonetic units associated with the rejection model self loops of the start state based rejection model and adding the highest probability score to a maximum of the score transitioned from the particular key phrase state and a previous score of the single rejection state to provide a score of the single rejection state at the current time instance.
  - 3. The method of claim 1, wherein said updating comprises:
    - transitioning a second score from a second key phrase state of the plurality of key phrase states of the key phrase model to the single rejection state of the start state based rejection model; and
      
      determining a highest probability score from the particular scores of sub-phonetic units associated with the rejection model self loops of the start state based rejection model and adding the highest probability score to a maximum of the score transitioned from the particular key phrase state, the second score transitioned from the second key phrase state, and a previous score of the single rejection state to provide a score of the single rejection state at the current time instance.
  - 4. The method of claim 1, wherein the plurality of key phrase states of the key phrase model are associated with second scores of the scores of sub-phonetic units, and wherein none of the second scores are included in the particular scores of sub-phonetic units associated with the rejection model self loops of the start state based rejection model.
  - 5. The method of claim 1, wherein the key phrase likelihood score comprises a minimum of a first likelihood score associated with a second key phrase state of the key phrase model and a second likelihood score associated with a third key phrase state of the key phrase model.
  - 6. The method of claim 1, wherein the particular key phrase state of the key phrase model is associated with a word end within the predetermined key phrase.
  - 7. The method of claim 1, wherein said updating comprises determining a second score from the scores of sub-phonetic units corresponding to the next key phrase state and adding the second score to a maximum of the score transitioned from the particular key phrase state and a previous score of the next key phrase state to provide a current score of the next key phrase state at the current time instance.
  - 8. The method of claim 1, wherein the key phrase likelihood score is associated with a final key phrase state of the key phrase model.
  - 9. The method of claim 1, wherein determining whether the received audio input is associated with the predetermined key phrase comprises determining a log likelihood score based on the rejection likelihood score and the key phrase likelihood score and comparing the log likelihood score to a threshold.

10. A computer-implemented method for key phrase detection comprising:
- updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on scores of sub-phonetic units representative of received audio input, wherein the start state based rejection model includes a single rejection state having a plurality of rejection model self loops each associated with a particular score of the scores of sub-phonetic units and wherein the key phrase model includes a plurality of key phrase states interconnected by transitions therebetween, the start state based rejection model and the key phrase model being connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states;
  
  determining a rejection likelihood score based on the single rejection state of the updated start state based rejection model;
  
  determining an overall key phrase likelihood score comprising a minimum of only a subset of likelihood scores associated with a corresponding subset of key phrase states of the key phrase model including at least a first likelihood score associated with a first key phrase state corresponding to an end of a first portion of the key phrase and a second likelihood score associated with a final key phrase state of the key phrase model corresponding to an end of a second portion of the key phrase; and
  
  detecting the predetermined key phrase in the received audio input based on the rejection likelihood score and the key phrase likelihood score; and
  
  providing a wake indicator or a command in response to the detected predetermined key phrase.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, wherein the first likelihood score is a maximum first likelihood score attained at the first key phrase state over a particular time interval and the second likelihood score is a maximum second likelihood score attained at the final key phrase state over the particular time interval.
  - 12. The method of claim 10, wherein the first likelihood score corresponds to a first time instance and the second likelihood score corresponds to a second time instance.
  - 13. The method of claim 12, wherein determining whether the received audio input is associated with the predetermined key phrase comprises verifying the second time instance is subsequent to the first time instance.
  - 14. The method of claim 10, wherein the first key phrase state corresponds to an endpoint of a first word of the key phrase model and the final key phrase state corresponds to an endpoint of a second word of the key phrase model.
  - 15. The method of claim 10, wherein determining whether the received audio input is associated with the predetermined key phrase comprises determining a log likelihood score based on the rejection likelihood score and the overall key phrase likelihood score and comparing the log likelihood score to a threshold.

16. A system for performing key phrase detection comprising:
- a memory configured to store a start state based rejection model and a key phrase model associated with a predetermined key phrase;
  
  a digital signal processor coupled to the memory, the digital signal processor to update, at a current time instance, the start state based rejection model and the key phrase model based on scores of sub-phonetic units representative of received audio input, wherein the start state based rejection model includes a single rejection state having a plurality of rejection model self loops each associated with a particular score of the scores of sub-phonetic units, wherein the key phrase model includes a plurality of key phrase states interconnected by transitions therebetween, wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states, and wherein to update the start state based rejection model and the key phrase model, the digital signal processor is to transition a score from a particular key phrase state of the plurality of key phrase states of the key phrase model to a next key phrase state of the plurality of key phrase states of the key phrase model, to transition the score from the particular key phrase state to the single rejection state of the start state based rejection model, and to generate a rejection likelihood score corresponding to the single rejection state of the start state based rejection model and a key phrase likelihood score corresponding to the key phrase model; and
  
  detect the predetermined key phrase in the received audio input based on the rejection likelihood score and the key phrase likelihood score; and
  
  provide a wake indicator or a command in response to the detected predetermined key phrase.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, wherein to update the start state based rejection model and the key phrase model, the digital signal processor is to determine a highest probability score from the particular scores of sub-phonetic units associated with the rejection model self loops of the start state based rejection model and to add the highest probability score to a maximum of the score transitioned from the particular key phrase state and a previous score of the single rejection state to provide a score of the single rejection state at the current time instance.
  - 18. The system of claim 16, wherein to update the start state based rejection model and the key phrase model, the digital signal processor is to transition a second score from a second key phrase state of the plurality of key phrase states of the key phrase model to the single rejection state of the start state based rejection model and to determine a highest probability score from the particular scores of sub-phonetic units associated with the rejection model self loops of the start state based rejection model and to add the highest probability score to a maximum of the score transitioned from the particular key phrase state, the second score transitioned from the second key phrase state, and a previous score of the single rejection state to provide a score of the single rejection state at the current time instance.
  - 19. The system of claim 16, wherein the plurality of key phrase states of the key phrase model are associated with second scores of the scores of sub-phonetic units, and wherein none of the second scores are included in the particular scores of sub-phonetic units associated with the rejection model self loops of the start state based rejection model.
  - 20. The system of claim 16, wherein the key phrase likelihood score comprises a minimum of a first likelihood score associated with a second key phrase state of the key phrase model and a second likelihood score associated with a third key phrase state of the key phrase model.

21. A system for performing key phrase detection comprising:
- a memory configured to store a start state based rejection model and a key phrase model associated with a predetermined key phrase; and
  
  a digital signal processor coupled to the memory, the digital signal processor to update the start state based rejection model and the key phrase model based on scores of sub-phonetic units representative of received audio input, wherein the start state based rejection model includes a single rejection state having a plurality of rejection model self loops each associated with a particular score of the scores of sub-phonetic units and wherein the key phrase model includes a plurality of key phrase states interconnected by transitions therebetween, the start state based rejection model and the key phrase model being connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states, to determine a rejection likelihood score based on the single rejection state of the updated start state based rejection model, to determine an overall key phrase likelihood score comprising a minimum of only a subset of likelihood scores associated with a corresponding subset of key phrase states of the key phrase model including at least a first likelihood score associated with a first key phrase state corresponding to an end of a first portion of the key phrase and a second likelihood score associated with a final key phrase state of the key phrase model corresponding to an end of a second portion of the key phrase;
  
  detect the predetermined key phrase in the received audio input based on the rejection likelihood score and the key phrase likelihood score; and
  
  provide a wake indicator or a command in response to the detected predetermined key phrase.
- View Dependent Claims (22, 23, 24)
- - 22. The system of claim 21, wherein the first likelihood score is a maximum first likelihood score attained at the first key phrase state over a particular time interval and the second likelihood score is a maximum second likelihood score attained at the final key phrase state over the particular time interval.
  - 23. The system of claim 21, wherein the first likelihood score corresponds to a first time instance, the second likelihood score corresponds to a second time instance, and the digital signal processor to determine whether the received audio input is associated with the predetermined key phrase comprises the digital signal processor to verify the second time instance is subsequent to the first time instance.
  - 24. The system of claim 21, wherein the first key phrase state corresponds to an endpoint of a first word of the key phrase model and the final key phrase state corresponds to an endpoint of a second word of the key phrase model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Bocklet, Tobias, Marek, Adam, Dorau, Tomasz, Sobon, Przemyslaw
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US15/057,695
Publication Number

US 20170256255A1
Time in Patent Office

805 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/685   using automatically derived...

G10L 15/08   Speech classification or se...

G10L 15/10   using distance or distortio...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/22   Procedures used during a sp...

G10L 17/22   Interactive procedures; Man...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 25/78   Detection of presence or ab...

Intermediate scoring and rejection loopback for improved key phrase detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

56 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Intermediate scoring and rejection loopback for improved key phrase detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links