Low resource key phrase detection for wake on voice

US 10,325,594 B2
Filed: 10/17/2017
Issued: 06/18/2019
Est. Priority Date: 11/24/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for key phrase detection comprising:

receiving a time series of scores of sub-phonetic units based on received audio input;

updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a key phrase score, wherein the start state based rejection model has a single rejection state comprising one or more rejection model self loops each associated with a particular score of the scores of sub-phonetic units and the key phrase model comprises a plurality of key phrase states interconnected by transitions therebetween with each of the key phrase states comprising a self loop associated with a particular score of the scores of sub-phonetic units; and

determining whether the received audio input is associated with the predetermined key phrase based on the key phrase score.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.

62 Citations

View as Search Results

20 Claims

1. A computer-implemented method for key phrase detection comprising:
- receiving a time series of scores of sub-phonetic units based on received audio input;
  
  updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a key phrase score, wherein the start state based rejection model has a single rejection state comprising one or more rejection model self loops each associated with a particular score of the scores of sub-phonetic units and the key phrase model comprises a plurality of key phrase states interconnected by transitions therebetween with each of the key phrase states comprising a self loop associated with a particular score of the scores of sub-phonetic units; and
  
  determining whether the received audio input is associated with the predetermined key phrase based on the key phrase score.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states.
  - 3. The method of claim 1, wherein updating the start state based rejection model comprises providing a continual summing at the single rejection state based on a previous score of the single rejection state and the one or more particular scores corresponding to the one or more rejection model self loops.
  - 4. The method of claim 1, wherein updating the key phrase model comprises providing a continual summing at a first key phrase state of the plurality of key phrase states based on a previous score of the first key phrase state, the particular score corresponding to the self loop of the first key phrase state, and a second score transitioned to the first key phrase state from another state.
  - 5. The method of claim 4, wherein updating the key phrase model comprises:
    - comparing a sum of the previous score and the particular score corresponding to the self loop of the first key phrase state to the second score; and
      
      updating the score for the second key phrase state to the second score when the second score is greater than the sum.
  - 6. The method of claim 1, further comprising:
    - updating a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score; and
      
      determining whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score.

7. At least one non-transitory machine readable medium comprising a plurality of instructions that, in response to being executed on a device, cause the device to perform key phrase detection by:
- receiving a time series of scores of sub-phonetic units based on received audio input;
  
  updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a key phrase score, wherein the start state based rejection model has a single rejection state comprising one or more rejection model self loops each associated with a particular score of the scores of sub-phonetic units and the key phrase model comprises a plurality of key phrase states interconnected by transitions therebetween with each of the key phrase states comprising a self loop associated with a particular score of the scores of sub-phonetic units; and
  
  determining whether the received audio input is associated with the predetermined key phrase based on the key phrase score.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The non-transitory machine readable medium of claim 7, wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states.
  - 9. The non-transitory machine readable medium of claim 7, wherein updating the start state based rejection model comprises providing a continual summing at the single rejection state based on a previous score of the single rejection state and the one or more particular scores corresponding to the one or more rejection model self loops.
  - 10. The non-transitory machine readable medium of claim 7, wherein updating the key phrase model comprises providing a continual summing at a first key phrase state of the plurality of key phrase states based on a previous score of the first key phrase state, the particular score corresponding to the self loop of the first key phrase state, and a second score transitioned to the first key phrase state from another state.
  - 11. The non-transitory machine readable medium of claim 10, wherein updating the key phrase model comprises:
    - comparing a sum of the previous score and the particular score corresponding to the self loop of the first key phrase state to the second score; and
      
      updating the score for the second key phrase state to the second score when the second score is greater than the sum.
  - 12. The non-transitory machine readable medium of claim 7, wherein the non-transitory machine readable medium comprises further instructions that, in response to being executed on the device, cause the device to perform key phrase detection by:
    - updating a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score; and
      
      determining whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score.

13. A system for performing key phrase detection comprising:
- a memory configured to store a start state based rejection model and a key phrase model associated with a predetermined key phrase; and
  
  a processor coupled to the memory, the processor to receive a time series of scores of sub-phonetic units based on received audio input, to update the start state based rejection model and the key phrase model based on at least some of the time series of scores of sub-phonetic units to generate a key phrase score, wherein the start state based rejection model has a single rejection state comprising one or more rejection model self loops each associated with a particular score of the scores of sub-phonetic units and the key phrase model comprises a plurality of key phrase states interconnected by transitions therebetween with each of the key phrase states comprising a self loop associated with a particular score of the scores of sub-phonetic units, and to determine whether the received audio input is associated with the predetermined key phrase based on the key phrase score.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The system of claim 13, wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states.
  - 15. The system of claim 13, wherein the processor to update the start state based rejection model comprises the processor to provide a continual summing at the single rejection state based on a previous score of the single rejection state and the one or more particular scores corresponding to the one or more rejection model self loops.
  - 16. The system of claim 13, wherein the processor to update the key phrase model comprises the processor to provide a continual summing at a first key phrase state of the plurality of key phrase states based on a previous score of the first key phrase state, the particular score corresponding to the self loop of the first key phrase state, and a second score transitioned to the first key phrase state from another state.
  - 17. The system of claim 16, wherein the processor to update the key phrase model comprises the processor to compare a sum of the previous score and the particular score corresponding to the self loop of the first key phrase state to the second score and to update the score for the second key phrase state to the second score when the second score is greater than the sum.
  - 18. The system of claim 13, wherein the key phrase model comprises a multi-state lexicon look up key phrase model and the transitions of the key phrase model are associated with the lexicon look up for the predetermined key phrase.
  - 19. The system of claim 13, wherein the processor to determine whether the received audio input is associated with the predetermined key phrase comprises the processor to determine the key phrase score as a log likelihood score based on a rejection likelihood score corresponding to the start state based rejection model and the a phrase likelihood score corresponding to the key phrase model and to compare the key phrase score to a threshold.
  - 20. The system of claim 13, wherein the processor is to update a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score and to determine whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel IP Corporation (Intel Corporation)
Inventors
Bocklet, Tobias, Hofer, Joachim
Primary Examiner(s)
Roberts, Shaun

Application Number

US15/786,089
Publication Number

US 20180261218A1
Time in Patent Office

609 Days
Field of Search

704251, 704254, 704275
US Class Current
CPC Class Codes

G06F 40/284   Lexical analysis, e.g. toke...

G10L 15/01   Assessment or evaluation of...

G10L 15/14   using statistical models, e...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/22   Procedures used during a sp...

G10L 17/22   Interactive procedures; Man...

G10L 2015/025   Phonemes, fenemes or fenone...

Low resource key phrase detection for wake on voice

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

62 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Low resource key phrase detection for wake on voice

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

62 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others