Methods and systems for identifying keywords in speech signal

US 9,799,325 B1
Filed: 04/14/2016
Issued: 10/24/2017
Est. Priority Date: 04/14/2016
Status: Expired due to Fees

First Claim

Patent Images

1. A method of keyword recognition in a speech signal, the method comprising:

sampling, by one or more processors, the speech signal in one or more frames;

determining, by the one or more processors, a first likelihood score of one or more features of a frame, of the one or more frames, of the speech signal being associated with one or more states in a first model, wherein the one or more states in the first model correspond to one or more tied triphone states of a keyword to be recognized in the speech signal, and wherein the one or more features comprise a frequency of an audio in the frame;

determining, by the one or more processors, a second likelihood score of the one or more features of the frame of the speech signal being associated with one or more states in a second model, wherein the one or more states in the second model correspond to one or more monophone states of the keyword to be recognized in the speech signal;

determining, by the one or more processors, a third likelihood score based on the first likelihood score and the second likelihood score, wherein the third likelihood score is deterministic of a likelihood of the frame corresponding to keywords other than the keyword; and

determining, by the one or more processors, a presence of the keyword in the speech signal based on the first likelihood score and the third likelihood score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments relate to a method of keyword recognition in a speech signal. The method includes determining a first likelihood score and a second likelihood score of one or more features of a frame of said speech signal being associated with one or more states in a first model and one or more states in a second model, respectively. The one or more states in the first model corresponds to one or more tied triphone states and the one or more states in the second model corresponds to one or more monophone states of a keyword to be recognized in the speech signal. The method further includes determining a third likelihood score based on the first likelihood score and the second likelihood score. The first likelihood score and the third likelihood score are utilizable to determine presence of the keyword in the speech signal.

Citations

15 Claims

1. A method of keyword recognition in a speech signal, the method comprising:
- sampling, by one or more processors, the speech signal in one or more frames;
  
  determining, by the one or more processors, a first likelihood score of one or more features of a frame, of the one or more frames, of the speech signal being associated with one or more states in a first model, wherein the one or more states in the first model correspond to one or more tied triphone states of a keyword to be recognized in the speech signal, and wherein the one or more features comprise a frequency of an audio in the frame;
  
  determining, by the one or more processors, a second likelihood score of the one or more features of the frame of the speech signal being associated with one or more states in a second model, wherein the one or more states in the second model correspond to one or more monophone states of the keyword to be recognized in the speech signal;
  
  determining, by the one or more processors, a third likelihood score based on the first likelihood score and the second likelihood score, wherein the third likelihood score is deterministic of a likelihood of the frame corresponding to keywords other than the keyword; and
  
  determining, by the one or more processors, a presence of the keyword in the speech signal based on the first likelihood score and the third likelihood score.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising training, by the one or more processors, the first model based on a Gaussian mixture model (GMM) for each of the one or more tied triphone states, wherein the one or more tied triphone states are based on one or more triphone states of the keyword.
  - 3. The method of claim 1, further comprising determining, by the one or more processors, a maxima between the first likelihood score and the second likelihood score.
  - 4. The method of claim 3, further comprising determining, by the one or more processors, a minima between the first likelihood score and the second likelihood score, wherein the determination of the third likelihood score is based on the maxima, the minima, and a value.
  - 5. The method of claim 1, further comprising determining, by the one or more processors, a first score for each of the one or more states in the first model based on the first score of the one or more states in the first model for a previous frame, of the one or more frames, of the speech signal and the first likelihood score, wherein the keyword is recognized in the speech signal based on the first score.
  - 6. The method of claim 1, wherein the determination of the third likelihood score is based on a third model, wherein the third model comprises a garbage state.
  - 7. The method of claim 6, further comprising determining, by the one or more processors, a second score based on the third likelihood score.

8. A system of keyword recognition in a speech signal, the system comprising:
- one or more processors configured to;
  
  sample the speech signal in one or more frames;
  
  determine a first likelihood score of one or more features of a frame, of the one or more frames, of the speech signal being associated with one or more states in a first model, wherein the one or more states in the first model correspond to one or more tied triphone states of a keyword to be recognized in the speech signal, and wherein the one or more features comprise a frequency of an audio in the frame;
  
  determine a second likelihood score of the one or more features of the frame of the speech signal being associated with one or more states in a second model, wherein the one or more states in the second model correspond to one or more monophone states of the keyword to be recognized in the speech signal;
  
  determine a third likelihood score based on the first likelihood score and the second likelihood score, wherein the third likelihood score is deterministic of a likelihood of the frame corresponding to keywords other than the keyword; and
  
  determine a presence of the keyword in the speech signal based on the first likelihood score and the third likelihood score.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the one or more processors are further configured to train the first model based on a Gaussian mixture model (GMM) for each of the one or more tied triphone states, wherein the one or more tied triphone states are based on one or more triphone states of the keyword.
  - 10. The system of claim 8, wherein the one or more processors are further configured to determine a maxima between the first likelihood score and the second likelihood score.
  - 11. The system of claim 10, wherein the one or more processors are further configured to determine a minima between the first likelihood score and the second likelihood score, wherein the determination of the third likelihood score is based on the maxima, the minima, and a value.
  - 12. The system of claim 8, wherein the one or more processors are further configured to determine a first score for each of the one or more states in the first model based on the first score of the one or more states in the first model for a previous frame, of the one or more frames, of the speech signal and the first likelihood score, wherein the keyword is recognized in the speech signal based on the first score.
  - 13. The system of claim 8, wherein the determination of the third likelihood score is based on a third model, wherein the third model comprises a garbage state.
  - 14. The system of claim 13, wherein the one or more processors are further configured to determine a second score based on the third likelihood score.

15. A computer program product for use with a computer, the computer program product comprising a non-transitory computer readable medium, wherein the non-transitory computer readable medium stores a computer program code for keyword recognition in a speech signal, wherein the computer program code is executable by one or more processors to:
- sample the speech signal in one or more frames;
  
  determine a first likelihood score of one or more features of a frame, of the one or more frames, of the speech signal being associated with one or more states in a first model, wherein the one or more states in the first model correspond to one or more tied triphone states of a keyword to be recognized in the speech signal, and wherein the one or more features comprise a frequency of an audio in the frame;
  
  determine a second likelihood score of the one or more features of the frame of the speech signal being associated with one or more states in a second model, wherein the one or more states in the second model correspond to one or more monophone states of the keyword to be recognized in the speech signal;
  
  determine a third likelihood score based on the first likelihood score and the second likelihood score, wherein the third likelihood score is deterministic of a likelihood of the frame corresponding to keywords other than the keyword; and
  
  determine a presence of the keyword in the speech signal based on the first likelihood score and the third likelihood score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Tyagi, Vivek, Prasad, Prathosh Aragulla
Primary Examiner(s)
Baker, Charlotte M

Application Number

US15/098,343
Publication Number

US 20170301341A1
Time in Patent Office

558 Days
Field of Search

704231, 704255, 704235, 704257, 704256, 7042561, 7042566, 704254, 704E15005, 704E15028
US Class Current
CPC Class Codes

G10L 15/14   using statistical models, e...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 2015/022   Demisyllables, biphones or ...

G10L 2015/088   Word spotting

Methods and systems for identifying keywords in speech signal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for identifying keywords in speech signal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links