False alarm reduction in speech recognition systems using contextual information

US 9,646,605 B2
Filed: 01/22/2013
Issued: 05/09/2017
Est. Priority Date: 01/22/2013
Status: Active Grant

First Claim

Patent Images

1. A computerized method for reducing false alarms in a speech recognition system, the method comprising:

receiving a plurality of training examples;

generating a model of a left internal context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the left internal context in the form of spectral, cepstral or sinusoidal descriptions;

generating a model of a right internal context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the right internal context in the form of spectral, cepstral or sinusoidal descriptions;

generating a model of a left external context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the left external context in the form of spectral, cepstral or sinusoidal descriptions;

generating a model of a right external context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the right external context in the form of spectral, cepstral or sinusoidal descriptions;

receiving at least one test word, the at least one test word comprising an external context;

comparing the external context of the at least one test word against a threshold associated with each of the model of the left internal context, the model of the right internal context, the model of the left external context, and the model of the right external context; and

rejecting the at least one test word if it is not within the thresholds.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are presented for using spoken word verification to reduce false alarms by exploiting global and local contexts on a lexical level, a phoneme level, and on an acoustical level. The reduction of false alarms may occur through a process that determines whether a word has been detected or if it is a false alarm. Training examples are used to generate models of internal and external contexts which are compared to test word examples. The word may be accepted or rejected based on comparison results. Comparison may be performed either at the end of the process or at multiple steps of the process to determine whether the word is rejected.

32 Citations

View as Search Results

14 Claims

1. A computerized method for reducing false alarms in a speech recognition system, the method comprising:
- receiving a plurality of training examples;
  
  generating a model of a left internal context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the left internal context in the form of spectral, cepstral or sinusoidal descriptions;
  
  generating a model of a right internal context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the right internal context in the form of spectral, cepstral or sinusoidal descriptions;
  
  generating a model of a left external context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the left external context in the form of spectral, cepstral or sinusoidal descriptions;
  
  generating a model of a right external context based at least in part on the plurality of training examples, wherein the generation of the model includes compact representation of the right external context in the form of spectral, cepstral or sinusoidal descriptions;
  
  receiving at least one test word, the at least one test word comprising an external context;
  
  comparing the external context of the at least one test word against a threshold associated with each of the model of the left internal context, the model of the right internal context, the model of the left external context, and the model of the right external context; and
  
  rejecting the at least one test word if it is not within the thresholds.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the test word is an analog context.
  - 3. The method of claim 2, further comprising converting the test word from an analog context to a digital format.
  - 4. The method of claim 1, further comprising:
    - learning an acceptable threshold for each of the model of the left internal context, the model of the right internal context, the model of the left external context, and the model of the right external context based at least in part on cross-validating sets; and
      
      wherein the comparing step is performed using each acceptable threshold.
  - 5. The method of claim 1, wherein each training example in the plurality of training examples comprises a representation of a test word and a local context;
    - andwherein each local context is based on average phoneme and syllable duration from similar word types.
  - 6. The method of claim 1, wherein the comparing step comprises the additional step of evaluating the at least one word with a perplexity test.
  - 7. The method of claim 1, wherein each of the model of the left internal context, the model of the right internal context, the model of the left external context, and the model of the right external context include compact representations.

8. A computerized method for reducing false alarms in a speech recognition system, the method comprising:
- receiving a plurality of training examples, each training example comprising a representation of a spoken word and a local context;
  
  generating at least one model of an acoustic context based on the plurality of training examples, wherein the generation of the model includes compact representation of the acoustic context in the form of spectral, cepstral or sinusoidal descriptions;
  
  generating at least one model of a phonetic context based on the plurality of training examples, wherein the generation of the model includes compact representation of the phonetic context in the form of spectral, cepstral or sinusoidal descriptions;
  
  generating at least one model of a linguistic context based on the plurality of training examples, wherein the generation of the model includes compact representation of the linguistic context in the form of spectral, cepstral or sinusoidal descriptions;
  
  receiving at least one test word, the at least one test word comprising an external context;
  
  comparing the at least one test word against a threshold associated with each of the model of the acoustic context, the model of the phonetic context, and the model of the linguistic context; and
  
  rejecting the at least one test word if it is not within the thresholds.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, wherein the spoken word is an analog context.
  - 10. The method of claim 9, further comprising converting the spoken word from an analog context to a digital format.
  - 11. The method of claim 8, further comprising:
    - learning an acceptable threshold for each of the model of the acoustic context, the model of the phonetic context, and the model of the linguistic context based at least in part on cross-validating sets; and
      
      wherein the comparing step is performed using each acceptable threshold.
  - 12. The method of claim 8, wherein each training example in the plurality of training examples comprises a representation of a spoken word and a local context;
    - andwherein each local context is based on average phoneme and syllable duration from similar word types.
  - 13. The method of claim 8, wherein the comparing step comprises the additional step of evaluating the at least one word with a perplexity test.
  - 14. The method of claim 8, wherein the generating at least one model of an acoustic context step includes generating a left internal model, a right internal model, a left external model, and a right external model for each spoken word in the plurality of training examples.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Genesys Telecommunications Laboratories Incorporated (Genesys Cloud Services Incorporated)
Original Assignee
Interactive Intelligence Group Incorporated (Genesys Cloud Services Incorporated)
Inventors
Biatov, Konstantin, Ganapathiraju, Aravind, Wyss, Felix Immanuel
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
SHIN, SEONG-AH A

Application Number

US13/746,687
Publication Number

US 20140207457A1
Time in Patent Office

1,568 Days
Field of Search

704243, 704 9, 704233
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/183   using context dependencies,...

G10L 2015/088   Word spotting

False alarm reduction in speech recognition systems using contextual information

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

False alarm reduction in speech recognition systems using contextual information

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links