Reducing false positives in speech recognition systems

US 8,781,825 B2
Filed: 08/24/2011
Issued: 07/15/2014
Est. Priority Date: 08/24/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by a computing device, a spoken utterance;

performing, by the computing device, speech recognition processing on the spoken utterance and generating a recognition result;

determining, by the computing device, consistency of duration of component sounds of the recognition result, the determining comprising;

calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result;

for each component sound;

calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and

calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and

calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and

validating, by the computing device, the recognition result based on the duration consistency score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.

25 Citations

View as Search Results

18 Claims

1. A method comprising:
- receiving, by a computing device, a spoken utterance;
  
  performing, by the computing device, speech recognition processing on the spoken utterance and generating a recognition result;
  
  determining, by the computing device, consistency of duration of component sounds of the recognition result, the determining comprising;
  
  calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result;
  
  for each component sound;
  
  calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and
  
  calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and
  
  calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and
  
  validating, by the computing device, the recognition result based on the duration consistency score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 further comprising:
    - determining consistency of energy of the component sounds; and
      
      generating an energy consistency score,wherein the validating of the recognition result is further based on the energy consistency score.
  - 3. The method of claim 2 further comprising:
    - determining consistency of pitch of the component sounds; and
      
      generating a pitch consistency score,wherein the validating of the recognition result is further based on the pitch consistency score.
  - 4. The method of claim 1 wherein the recognition result is associated with a recognition score, and wherein validating the recognition result comprises combining the recognition score with the duration consistency score to generate a combined score and comparing the combined score to a threshold.
  - 5. The method of claim 1 wherein the expected duration for each component sound is an average duration value that is generated from a speaker-independent training set of utterances.
  - 6. The method of claim 1 wherein validating the recognition result comprises:
    - comparing the duration consistency score to a threshold;
      
      rejecting the recognition result if the consistency of the parameter duration consistency score crosses the threshold; and
      
      accepting the recognition result if the duration consistency score does not cross the threshold.
  - 7. The method of claim 6 wherein if the duration consistency score crosses the threshold, then the durations of the component sounds are insufficiently consistent, and wherein if the duration consistency score does not cross the threshold, then the durations of the component sounds are sufficiently consistent.
  - 8. The method of claim 1 wherein the component sounds are one of phonemes, sub-phones, syllables, and words.

9. A system comprising:
- a processor; and
  
  a non-transitory computer readable storage medium having stored thereon program code that, when executed by the processor, causes the processor to;
  
  receive a spoken utterance;
  
  perform speech recognition processing on the spoken utterance and generate a recognition result;
  
  determine consistency of duration of component sounds of the recognition result, the determining comprising;
  
  calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result;
  
  for each component sound;
  
  calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and
  
  calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and
  
  calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and
  
  validate the recognition result based on the duration consistency score.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The system of claim 9 wherein the program code further causes the processor to:
    - determine consistency of energy of the component sounds; and
      
      generating an energy consistency score,wherein the validating of the recognition result is further based on the energy consistency score.
  - 11. The system of claim 9 wherein the program code further causes the processor to:
    - determine consistency of pitch of the component sounds; and
      
      generating a pitch consistency score,wherein the validating of the recognition result is further based on the pitch consistency score.
  - 12. The system of claim 9 wherein the expected duration for each component sound is an average duration value that is generated from a speaker-independent training set of utterances.
  - 13. The system of claim 9 wherein validating the recognition result comprises:
    - comparing the duration consistency score to a threshold;
      
      rejecting the recognition result if the duration consistency score crosses the threshold; and
      
      accepting the recognition result if the duration consistency score does not cross the threshold.

14. A non-transitory computer readable storage medium having stored thereon program code executable by a processor, the program code comprising:
- code that causes the processor to receive a spoken utterance;
  
  code that causes the processor to perform speech recognition processing on the spoken utterance and generate a recognition result;
  
  code that causes the processor to determine consistency of duration of component sounds of the recognition result, the determining comprising;
  
  calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result;
  
  for each component sound;
  
  calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and
  
  calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and
  
  calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and
  
  code that causes the processor to validate the recognition result based on the duration consistency score.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The non-transitory computer readable storage medium of claim 14 wherein the program code further comprises:
    - code that causes the processor to determine consistency of energy of the component sounds; and
      
      code that causes the processor to generate an energy consistency score,wherein the validating of the recognition result is further based on the energy consistency score.
  - 16. The non-transitory computer readable storage medium of claim 14 wherein the program code further comprises:
    - determine consistency of pitch of the component sounds; and
      
      generating a pitch consistency score,wherein the validating of the recognition result is further based on the pitch consistency score.
  - 17. The non-transitory computer readable storage medium of claim 14 wherein the expected duration for each component sound is an average duration value that is generated from a speaker-independent training set of utterances.
  - 18. The non-transitory computer readable storage medium of claim 14 wherein validating the recognition result comprises:
    - comparing the duration consistency score to a threshold;
      
      rejecting the recognition result if the duration consistency score crosses the threshold; and
      
      accepting the recognition result if the duration consistency score does not cross the threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sensory Incorporated
Original Assignee
Sensory Incorporated
Inventors
Shaw, Jonathan, Vermeulen, Pieter, Sutton, Stephen, Savoie, Robert
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US13/217,134
Publication Number

US 20130054242A1
Time in Patent Office

1,056 Days
Field of Search

704/231, 704/234, 704236-240, 704251-255, 704/235, 704/246
US Class Current

704/231
CPC Class Codes

G10L 15/10 using distance or distortio...

G10L 25/03 characterised by the type o...

Reducing false positives in speech recognition systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Reducing false positives in speech recognition systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links