Method and system for real-time keyword spotting for speech analytics

US 9,672,815 B2
Filed: 07/20/2012
Issued: 06/06/2017
Est. Priority Date: 07/20/2012
Status: Active Grant

First Claim

Patent Images

1. A computerized method for real-time spotting of predetermined keywords in an audio stream, in an automatic speech recognition system, wherein said system comprises at least a speech recognition engine, the method comprising the steps of:

a) developing a keyword model for the predetermined keywords;

b) comparing, in real-time, the keyword model and the audio stream to recognize candidates of the predetermined keywords in the audio stream;

c) computing, by the speech recognition engine, a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model, wherein the probability is determined utilizing a posterior based probability approach which comprises analysis of monophone models over an audio feature space;

d) comparing the computed probability of a keyword match to a predetermined threshold and declaring a match if the computed probability meets the predetermined threshold;

e) computing further data to aid in determination of mismatches, wherein said further data comprises empirical metrics, and determining if the candidates are mismatches; and

f) reporting spotted keywords if a mismatch is not identified at step (e),wherein the reporting comprises generating a report, via a microprocessor and software program, that is presented as a start and end time of the spotted keywords in the audio stream with the computed probability that the keywords were found.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are presented for real-time speech analytics in the speech analytics field. Real time audio is fed along with a keyword model, into a recognition engine. The recognition engine computes the probability of the audio stream data matching keywords in the keyword model. The probability is compared to a threshold where the system determines if the probability is indicative of whether or not the keyword has been spotted. Empirical metrics are computed and any false alarms are identified and rejected. The keyword may be reported as found when it is deemed not to be a false alarm and passes the threshold for detection.

Citations

26 Claims

1. A computerized method for real-time spotting of predetermined keywords in an audio stream, in an automatic speech recognition system, wherein said system comprises at least a speech recognition engine, the method comprising the steps of:
- a) developing a keyword model for the predetermined keywords;
  
  b) comparing, in real-time, the keyword model and the audio stream to recognize candidates of the predetermined keywords in the audio stream;
  
  c) computing, by the speech recognition engine, a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model, wherein the probability is determined utilizing a posterior based probability approach which comprises analysis of monophone models over an audio feature space;
  
  d) comparing the computed probability of a keyword match to a predetermined threshold and declaring a match if the computed probability meets the predetermined threshold;
  
  e) computing further data to aid in determination of mismatches, wherein said further data comprises empirical metrics, and determining if the candidates are mismatches; and
  
  f) reporting spotted keywords if a mismatch is not identified at step (e),wherein the reporting comprises generating a report, via a microprocessor and software program, that is presented as a start and end time of the spotted keywords in the audio stream with the computed probability that the keywords were found.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein step (b) comprises:
    - b.1) converting the audio stream into a sequence of spectral features; and
      
      b.2) comparing the keyword models to the sequence of spectral features.
  - 3. The method of claim 2, wherein step (b.1) comprises:
    - b.1.1) converting the audio stream into a sequence of windows; and
      
      b.1.2) calculating a set of 13 Mel Frequency Cepstrel Coefficients and their first and second order derivatives for each window.
  - 4. The method of claim 1, wherein step (c) comprises executing a Viterbi algorithm.
  - 5. The method of claim 1, wherein the posterior based probability approach comprises applying the mathematical equation:
  - 6. The method of claim 1, wherein step (c) comprises:
    - c.1) assigning a constant predetermined probability to the portions of the audio stream that do not match the keyword.
  - 7. The method of claim 1, wherein the audio stream comprises a continuous spoken speech stream.
  - 8. The method of claim 1, wherein step (a) comprises concatenating phoneme hidden Markov models of predetermined keywords.
  - 9. The method of claim 1, wherein step (a) comprises:
    - a.1) creating a pronunciation dictionary that defines a sequence of phonemes for each of the predetermined keywords;
      
      a.2) creating an acoustic model that statistically models a relation between textual properties of the phonemes for each of the predetermined keywords and spoken properties of the phonemes for each of the predetermined keywords; and
      
      a.3) concatenating acoustic models for the sequence of phonemes for each of the predetermined keywords.
  - 10. The method of claim 9, wherein step (a.2) comprises creating a set of Gaussian mixture models.
  - 11. The method of claim 9, wherein step (a.2) comprises creating the acoustic model selected from the group consisting of:
    - context-independent model, context-dependent model, and triphone model.
  - 12. The method of claim 1, wherein step (e) comprises computing further data selected from the group consisting of:
    - anti-word match scores, mismatch phoneme percentage, match phoneme percentage, duration penalized probability, and a predetermined Confidence value.
  - 13. The method of claim 12, wherein the predetermined Confidence value is chosen for each of the predetermined keywords so as to achieve a desired false alarm rate and accuracy.

14. A computerized method of speech recognition, in an automatic speech recognition system wherein said system comprises at least a speech recognition engine, for real-time spotting of predetermined keywords in an audio stream, comprising the steps of:
- a) developing a keyword model for the predetermined keywords;
  
  b) dividing, by the speech recognition engine, the audio stream into a series of points in an acoustic space that spans all possible sounds created in a particular language;
  
  c) determining, by the speech recognition engine, a posterior probability that a first trajectory of each keyword model for the predetermined keywords in the acoustic space matches a second trajectory of a portion of the series of points in the acoustic space, wherein the posterior probability is determined utilizing the mathematical equation;
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The method of claim 14, wherein the audio stream comprises a continuous spoken speech stream.
  - 16. The method of claim 14, wherein the space comprises a 39-dimensional space.
  - 17. The method of claim 14, wherein step (b) comprises:
    - b.1) converting the audio stream into a sequence of windows; and
      
      b.2) calculating a set of 13 Mel Frequency Cepstrel Coefficients and their first and second order derivatives for each window.
  - 18. The method of claim 14, wherein step (c) comprises executing a Viterbi algorithm.
  - 19. The method of claim 14, wherein step (c) comprises:
    - c.1) assigning a constant predetermined probability to the portions of the audio stream that do not match the keyword.
  - 20. The method of claim 14, wherein step (a) comprises concatenating phoneme hidden Markov models of predetermined keywords.
  - 21. The method of claim 14, wherein step (e) comprises:
    - e.1) declaring a potential spotted word if the posterior probability is greater than the predetermined threshold;
      
      e.2) computing further data to aid in determination of mismatches;
      
      e.3) using the further data to determine if the potential spotted word is a false alarm; and
      
      e.4) reporting spotted keyword if a false alarm is not identified at step (e.3).
  - 22. The method of claim 21, wherein step (e.2) comprises computing further data selected from the group consisting of:
    - anti-word match scores, mismatch phoneme percentage, match phoneme percentage, duration penalized probability, and a predetermined Confidence value.
  - 23. The method of claim 22, wherein the predetermined Confidence value is chosen for each of the predetermined keywords so as to achieve a desired false alarm rate and accuracy.
  - 24. The method of claim 14, wherein step (a) comprises:
    - a.1) creating a pronunciation dictionary that defines a sequence of phonemes for each of the predetermined keywords;
      
      a.2) creating an acoustic model that statistically models a relation between textual properties of the phonemes for each of the predetermined keywords and spoken properties of the phonemes for each of the predetermined keywords; and
      
      a.3) concatenating acoustic models for the sequence of phonemes for each of the predetermined keywords.
  - 25. The method of claim 24, wherein step (a.2) comprises creating a set of Gaussian mixture models.
  - 26. The method of claim 25, wherein step (a.2) comprises creating the acoustic model selected from the group consisting of:
    - context-independent model, context-dependent model, and triphone model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Genesys Cloud Services Incorporated
Original Assignee
Interactive Intelligence Group Incorporated (Genesys Cloud Services Incorporated)
Inventors
Iyer, Ananth Nagaraja, Ganapathiraju, Aravind
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US13/554,937
Publication Number

US 20140025379A1
Time in Patent Office

1,782 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 2015/088   Word spotting

Method and system for real-time keyword spotting for speech analytics

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for real-time keyword spotting for speech analytics

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links