REDUCING FALSE POSITIVES IN SPEECH RECOGNITION SYSTEMS

US 20130054242A1
Filed: 08/24/2011
Published: 02/28/2013
Est. Priority Date: 08/24/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a spoken utterance;

processing the spoken utterance in a speech recognizer to generate a recognition result;

determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter; and

validating the recognition result based on the consistency of at least one of said parameters.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.

25 Citations

View as Search Results

23 Claims

1. A method comprising:
- receiving a spoken utterance;
  
  processing the spoken utterance in a speech recognizer to generate a recognition result;
  
  determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter; and
  
  validating the recognition result based on the consistency of at least one of said parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The method of claim 1 wherein determining consistencies of one or more parameters comprises determining the consistency of duration of component sounds of the spoken utterance.
  - 3. The method of claim 1 wherein determining consistencies of one or more parameters comprises determining the consistency of energy of component sounds of the spoken utterance.
  - 4. The method of claim 1 wherein determining consistencies of one or more parameters comprises determining the consistency of pitch of component sounds of the spoken utterance.
  - 5. The method of claim 1 wherein consistencies for a plurality of parameters are determined, and wherein validating the recognition result is based on the separate consistency of each determined parameter.
  - 6. The method of claim 1 wherein the recognition result is a first score and one or more consistencies each have a score, and wherein validating the recognition result comprises combining the first score with scores of one or more consistencies to generate a second score and comparing the second score to a threshold.
  - 7. The method of claim 1 wherein determining consistencies comprises calculating a consistency measure based on predetermined consistency characteristics of the parameter and actual characteristics of the parameter in the spoken utterance.
  - 8. The method of claim 7 wherein the predetermined consistency characteristics are one or more predetermined statistical parameters for each of the one or more parameters of component sounds of the spoken utterance.
  - 9. The method of claim 8 wherein the one or more predetermined statistical parameters comprise an average value of the parameter for each component sound of the spoken utterance, and wherein the average value is generated from a training set of utterances.
  - 10. The method of claim 1 wherein validating the recognition result comprises:
    - comparing a particular consistency for a particular parameter to a threshold,rejecting the recognition result if the consistency of the parameter crosses the threshold, and accepting the recognition result if the consistency of the parameter does not cross the threshold.
  - 11. The method of claim 10 wherein if the consistency of the parameter crosses the threshold, then the parameter is insufficiently consistent, and wherein if the consistency of the parameter does not cross the threshold, then the parameter is sufficiently consistent.
  - 12. The method of claim 1 wherein the parameter is duration, and wherein determining consistency of duration comprises determining a speaker rate, wherein the speaker rate is based on a total duration of the spoken utterance divided by a sum of expected values of durations for each different component of sound in the utterance.
  - 13. The method of claim 12 wherein the expected values of durations are average durations for each different component of sound in the utterance.
  - 14. The method of claim 12 wherein at least one of the consistencies of the one or more parameters of component sounds of the spoken utterance comprise a consistency score, and wherein the consistency score is based on the speaker rate, actual durations of component sounds of the spoken utterance, and one or more statistical parameters for each component sound in the utterance.
  - 15. The method of claim 12 further comprises determining modified expected values based on the speaker rate.
  - 16. The method of claim 15 wherein the modified expected values are determined by multiplying the speaker rate by the expected values of durations for each different component of sound of the utterance.
  - 17. The method of claim 15 further comprises determining a plurality of delta values, and wherein the plurality of delta values are differences between each modified expected value and a duration of a component of sound corresponding each particular modified expected value.
  - 18. The method of claim 15 further comprises determining a plurality of delta values, and wherein the plurality of delta values are differences between a first function operable on each modified expected value and a second function operable on a duration of a component of sound corresponding to each particular modified expected value.
  - 19. The method of claim 18 wherein consistency is represented as a score, and wherein determining the consistency further comprises adding squares of said delta values for N components of sound in the utterance and dividing by N.
  - 20. The method of claim 18 wherein first function and second function are natural logarithms.
  - 21. The method of claim 18 wherein second function comprises an exponential of a standard deviation of the duration of the component of sound corresponding to each particular modified expected value.
  - 22. The method of claim 1 wherein the component sounds are one of phonemes, sub-phones, syllables, and words.

23. A system comprising:
- a processor; and
  
  a memory,wherein the processor is configured to;
  
  receive a spoken utterance;
  
  process the spoken utterance in a speech recognizer to generate a recognition result;
  
  determine consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter; and
  
  validate the recognition result based on the consistency of at least one of said parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sensory Incorporated
Original Assignee
Sensory Incorporated
Inventors
Sutton, Stephen, Shaw, Jonathan, Vermeulen, Pieter, Savoie, Robert

Granted Patent

US 8,781,825 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/239
CPC Class Codes

G10L 15/10 using distance or distortio...

G10L 25/03 characterised by the type o...

REDUCING FALSE POSITIVES IN SPEECH RECOGNITION SYSTEMS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

REDUCING FALSE POSITIVES IN SPEECH RECOGNITION SYSTEMS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links