INTERMEDIATE SCORING AND REJECTION LOOPBACK FOR IMPROVED KEY PHRASE DETECTION

US 20170256255A1
Filed: 03/01/2016
Published: 09/07/2017
Est. Priority Date: 03/01/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for key phrase detection comprising:

updating, at a current time instance, a start state based rejection model having a single state and a key phrase model having a plurality of states and associated with a predetermined key phrase based on scores of sub-phonetic units representative of received audio input, wherein said updating comprises;

providing a transition of a score from a particular state of the plurality of states of the key phrase model to a next state of the plurality of states of the key phrase model and to the single state of the rejection model; and

generating a rejection likelihood score corresponding to the single state of the start state based rejection model and a key phrase likelihood score corresponding to the key phrase model; and

determining whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include intermediate scoring of a state or states of a key phrase model and/or a backward transition or rejection loopback from a state of the key phrase model to a rejection model to reduce false accepts based on received utterances.

53 Citations

View as Search Results

25 Claims

1. A computer-implemented method for key phrase detection comprising:
- updating, at a current time instance, a start state based rejection model having a single state and a key phrase model having a plurality of states and associated with a predetermined key phrase based on scores of sub-phonetic units representative of received audio input, wherein said updating comprises;
  
  providing a transition of a score from a particular state of the plurality of states of the key phrase model to a next state of the plurality of states of the key phrase model and to the single state of the rejection model; and
  
  generating a rejection likelihood score corresponding to the single state of the start state based rejection model and a key phrase likelihood score corresponding to the key phrase model; and
  
  determining whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said updating comprises determining a highest probability score from a plurality of the scores of sub-phonetic units associated with the start state based rejection model and adding the highest probability score to a maximum of the score transitioned from the particular state and a previous score of the single state to provide a score of the single state at the current time instance.
  - 3. The method of claim 1, wherein said updating comprises:
    - providing a second transition of a second score from a second state of the plurality of states of the key phrase model to the single state of the rejection model; and
      
      determining a highest probability score from a plurality of the scores of sub-phonetic units associated with the start state based rejection model and adding the highest probability score to a maximum of the score transitioned from the particular state, the second score transitioned from the second state, and a previous score of the single state to provide a score of the single state at the current time instance.
  - 4. The method of claim 1, wherein the single state of the start state based rejection model comprises self loops associated with first scores of the scores of sub-phonetic units and the plurality of states of the key phrase model are associated with second scores of the scores of sub-phonetic units, and wherein none of the second scores are included in the first scores.
  - 5. The method of claim 1, wherein the key phrase likelihood score comprises a minimum of a first likelihood score associated with a first state of the key phrase model and a second likelihood score associated with a second state of the key phrase model.
  - 6. The method of claim 1, wherein the particular state of the key phrase model is associated with a word end within the predetermined key phrase.
  - 7. The method of claim 1, wherein said updating comprises determining a score from the scores of sub-phonetic units corresponding to the next state and adding the score to a maximum of the score transitioned from the particular state and a previous score of the next state to provide a score of the next state at the current time instance.
  - 8. The method of claim 1, wherein the key phrase likelihood score is associated with a final state of the key phrase model.
  - 9. The method of claim 1, wherein determining whether the received audio input is associated with the predetermined key phrase comprises determining a log likelihood score based on the rejection likelihood score and the key phrase likelihood score and comparing the log likelihood score to a threshold.

10. A computer-implemented method for key phrase detection comprising:
- updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on scores of sub-phonetic units representative of received audio input;
  
  determining a rejection likelihood score based on the updated start state based rejection model;
  
  determining an overall key phrase likelihood score comprising a minimum of a first likelihood score associated with a first state of the key phrase model and a second likelihood score associated with a second state of the key phrase model; and
  
  determining whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the overall key phrase likelihood score.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10, wherein the first likelihood score is a maximum first likelihood score attained at the first state over a particular time interval and the second likelihood score is a maximum second likelihood score attained at the second state over the particular time interval.
  - 12. The method of claim 10, wherein the first likelihood score corresponds to a first time instance and the second likelihood score corresponds to a second time instance.
  - 13. The method of claim 12, wherein determining whether the received audio input is associated with the predetermined key phrase comprises verifying the second time instance is subsequent to the first time instance.
  - 14. The method of claim 10, wherein the first state corresponds to an endpoint of a first word of the key phrase model and the second state corresponds to an endpoint of a second word of the key phrase model.
  - 15. The method of claim 10, wherein determining whether the received audio input is associated with the predetermined key phrase comprises determining a log likelihood score based on the rejection likelihood score and the overall key phrase likelihood score and comparing the log likelihood score to a threshold.
  - 16. The method of claim 10, wherein the start state based rejection model consists of a single state comprising self loops associated with at least some of the scores of sub-phonetic units of the acoustic model.

17. A system for performing key phrase detection comprising:
- a memory configured to store an acoustic model, a start state based rejection model, and a key phrase model associated with a predetermined key phrase; and
  
  a digital signal processor coupled to the memory, the digital signal processorto update, at a current time instance, the start state based rejection model having a single state and the key phrase model having a plurality of states based on scores of sub-phonetic units representative of received audio input, wherein to update the start state based rejection model and the key phrase model, the digital signal processor is to provide a transition of a score from a particular state of the plurality of states of the key phrase model to a next state of the plurality of states of the key phrase model and to the single state of the rejection model and to generate a rejection likelihood score corresponding to the single state of the start state based rejection model and a key phrase likelihood score corresponding to the key phrase model; and
  
  to determine whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The system of claim 17, wherein to update the start state based rejection model and the key phrase model, the digital signal processor is to determine a highest probability score from a plurality of the scores of sub-phonetic units associated with the start state based rejection model and add the highest probability score to a maximum of the score transitioned from the particular state and a previous score of the single state to provide a score of the single state at the current time instance.
  - 19. The system of claim 17, wherein to update the start state based rejection model and the key phrase model, the digital signal processor is to provide a second transition of a second score from a second state of the plurality of states of the key phrase model to the single state of the rejection model and to determine a highest probability score from a plurality of the scores of sub-phonetic units associated with the start state based rejection model and add the highest probability score to a maximum of the score transitioned from the particular state, the second score transitioned from the second state, and a previous score of the single state to provide a score of the single state at the current time instance.
  - 20. The system of claim 17, wherein the single state of the start state based rejection model comprises self loops associated with first scores of the scores of sub-phonetic units and the plurality of states of the key phrase model are associated with second scores of the scores of sub-phonetic units, and wherein none of the second scores are included in the first scores.
  - 21. The system of claim 17, wherein the key phrase likelihood score comprises a minimum of a first likelihood score associated with a first state of the key phrase model and a second likelihood score associated with a second state of the key phrase model.

22. A system for performing key phrase detection comprising:
- a memory configured to store an acoustic model, a start state based rejection model, and a key phrase model associated with a predetermined key phrase; and
  
  a digital signal processor coupled to the memory, the digital signal processor to update a start state based rejection model and a key phrase model associated with a predetermined key phrase based on scores of sub-phonetic units representative of received audio input, to determine a rejection likelihood score based on the updated start state based rejection model, to determine an overall key phrase likelihood score comprising a minimum of a first likelihood score associated with a first state of the key phrase model and a second likelihood score associated with a second state of the key phrase model, and to determine whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the overall key phrase likelihood score.
- View Dependent Claims (23, 24, 25)
- - 23. The system of claim 22, wherein the first likelihood score is a maximum first likelihood score attained at the first state over a particular time interval and the second likelihood score is a maximum second likelihood score attained at the second state over the particular time interval.
  - 24. The system of claim 22, wherein the first likelihood score corresponds to a first time instance, the second likelihood score corresponds to a second time instance, and the digital signal processor to determine whether the received audio input is associated with the predetermined key phrase comprises the digital signal processor to verify the second time instance is subsequent to the first time instance.
  - 25. The system of claim 22, wherein the first state corresponds to an endpoint of a first word of the key phrase model and the second state corresponds to an endpoint of a second word of the key phrase model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
BOCKLET, Tobias, MAREK, Adam, DORAU, Tomasz, SOBON, Przemyslaw

Granted Patent

US 9,972,313 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/685   using automatically derived...

G10L 15/08   Speech classification or se...

G10L 15/10   using distance or distortio...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/22   Procedures used during a sp...

G10L 17/22   Interactive procedures; Man...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 25/78   Detection of presence or ab...

INTERMEDIATE SCORING AND REJECTION LOOPBACK FOR IMPROVED KEY PHRASE DETECTION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

53 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

INTERMEDIATE SCORING AND REJECTION LOOPBACK FOR IMPROVED KEY PHRASE DETECTION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

53 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others