Human-augmented, automatic speech recognition engine

US 20020152071A1
Filed: 04/12/2001
Published: 10/17/2002
Est. Priority Date: 04/12/2001
Status: Abandoned Application

First Claim

Patent Images

1. A speech recognition system, comprising:

an automatic speech recognition engine;

a module in communication with said speech recognition engine for determining a confidence metric with regard to an utterance presented to said speech recognition engine, and for transmitting said utterance to a human operator for recognition and transcription when said confidence metric is below a predetermined threshold; and

a mechanism for providing said human transcription of said utterance back to said speech recognition engine.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method combines the advantages of automatic speech recognition and human-to-human conversation in a speech recognition engine. Human intervention is used to augment an automatic speech recognition engine. When a confidence metric is low enough, the system transmits an utterance to a human operator. The human then transcribes the text, which is then provided back to the automatic system. In the preferred embodiment, no real time human-to-human conversation ever actually takes place. Thus, the user experience is consistent with automatic, machine speech recognition. A mechanism is also provided for examining voice recognition statistics that are gathered over many users. If there is a high correction rate for a particular word or phrase, the system automatically directs words that are in a potential match list to a human transcriber and makes no independent effort to recognize such words. The speech system learns from such human transcription and improves its speech recognition models or grammar over time, based upon the input from human transcription.

137 Citations

36 Claims

1. A speech recognition system, comprising:
- an automatic speech recognition engine;
  
  a module in communication with said speech recognition engine for determining a confidence metric with regard to an utterance presented to said speech recognition engine, and for transmitting said utterance to a human operator for recognition and transcription when said confidence metric is below a predetermined threshold; and
  
  a mechanism for providing said human transcription of said utterance back to said speech recognition engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 2. The system of claim 1, further comprising:
    - a mechanism for gathering speech recognition statistics over many system users and for examining said voice recognition statistics;
      
      wherein, if there is a high correction rate for a particular word or phrase, said speech recognition engine automatically directs words in a potential match list for said word or phrase to a human transcriber and makes no independent effort to recognize such words.
  - 3. The system of claim 1, wherein said speech recognition engine learns from human transcription and improves its speech recognition models or grammar, based upon the input from human transcription.
  - 4. The system of claim 1, wherein human feedback is provided to handle relatively uncommon words that suddenly increase in popularity.
  - 5. The system of claim 1, wherein said speech recognition engine is cued to look at speech samples and recognize a user'"'"'s commands, wherein said commands, once recognized, are executed.
  - 6. The system of claim 1, wherein said speech recognition engine produces a list of potential phrases plus confidence readings for said phrases.
  - 7. The system of claim 1, further comprising:
    - a bank of human recognizers.
  - 8. The system of claim 7, wherein among said human recognizers there are people who are facile with different languages and can recognize said languages and redirect unrecognized speech through a speech recognition engine for such languages.
  - 9. The system of claim 8, wherein once a language is human recognized for a particular person, said speech recognition engine remembers that said person speaks said language and applies a dictionary for that language.
  - 10. The system of claim 1, wherein said speech recognition engine receives feedback from said human recognizers, wherein said speech recognition engine, with time, builds capability to handle phrases without human intervention.
  - 11. The system of claim 1, wherein real time human intervention is used by said human transcription mechanism to train said speech recognition engine.
  - 12. The system of claim 1, wherein feedback is directly applied by said human transcription mechanism to said speech recognition engine.
  - 13. The system of claim 1, wherein alternate recognizers are targeted by said human transcription mechanism.
  - 14. The system of claim 1, wherein grammars are optimized by said human transcription mechanism.
  - 15. The system of claim 13, wherein said human transcription mechanism provides a hint to said speech recognition engine to be stored in a household parameter block associated with a person whose speech is being recognized.
  - 16. The system of claim 1, wherein said human recognizer directs said system to provide feedback to a person who is speaking.
  - 17. The system of claim 1, wherein said human transcription mechanism connects a human recognizer directly to a user interface, thereby providing said human recognizer with the ability to display text back to a person who is speaking.
  - 18. The system of claim 1, wherein if it is not possible to resolve speech, then said human transcription mechanism directs a human recognizer directly to a person who is speaking to provide real time voice interaction.
  - 20. The method of claim 19, further comprising the steps of:
    - gathering speech recognition statistics over many system users and for examining said voice recognition statistics;
      
      wherein, if there is a high correction rate for a particular word or phrase, said speech recognition engine automatically directs words in a potential match list for said word or phrase to a human transcriber and makes no independent effort to recognize such words.
  - 21. The method of claim 19, wherein said speech recognition engine learns from human transcription and improves its speech recognition models or grammar, based upon the input from said transcription.
  - 22. The method of claim 19 wherein human feedback is provided to handle relatively uncommon words that suddenly increase in popularity.
  - 23. The method of claim 19, wherein said speech recognition engine is cued to look at speech samples and recognize a user'"'"'s commands, wherein said commands, once recognized, are executed.
  - 24. The method of claim 19, wherein said speech recognition engine produces a list of potential phrases plus confidence readings for said phrases, wherein said phrases are text strings.
  - 25. The method of claim 19, further comprising the step of:
    - providing a bank of human recognizers, wherein said bank may be either centrally located or distributed.
  - 26. The method of claim 25, wherein among said human recognizers there are people who are facile with different languages and can recognize said languages and redirect unrecognized speech through a speech recognition engine for such languages.
  - 27. The method of claim 26, wherein once a language is human recognized for a particular person, said speech recognition engine remembers that said person speaks said language and applies a dictionary for that language.
  - 28. The method of claim 19, wherein said speech recognition engine receives feedback from said human recognizers, wherein said speech recognition engine, with time, builds capability to handle phrases without human intervention.
  - 29. The method of claim 19, wherein real time human intervention is used to train said speech recognition engine.
  - 30. The method of claim 19, wherein feedback is directly applied to said speech recognition engine.
  - 31. The method of claim 19, wherein alternate recognizers are targeted by a human transcription mechanism.
  - 32. The method of claim 19, wherein grammars are optimized by a human transcription mechanism.
  - 33. The method of claim 31, wherein said human transcription mechanism provides a hint to said speech recognition engine in the form of a household parameter block associated with a person whose speech is being recognized.
  - 34. The method of claim 19, wherein said human recognizer directs said system to provide feedback to a person who is speaking.
  - 35. The method of claim 19, wherein a human transcription mechanism links a human recognizer directly to a user interface, thereby providing said human recognizer with the ability to display text back to a person who is speaking.
  - 36. The method of claim 19, wherein if it is not possible to resolve speech, then a human transcription mechanism connects a human recognizer directly to a person who is speaking to provide real time voice interaction.

19. A speech recognition method, comprising the steps of:
- providing an automatic speech recognition engine;
  
  determining a confidence metric with regard to an utterance presented to said speech recognition engine;
  
  transmitting said utterance to a human operator for recognition and transcription when said confidence metric is below a predetermined threshold; and
  
  providing said human transcription of said utterance back to said speech recognition engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Agiletv Corporation (Promptu Systems Corporation)
Original Assignee
Agiletv Corporation (Promptu Systems Corporation)
Inventors
Foster, Mark J., Chaiken, David

Application Number

US09/834,852
Publication Number

US 20020152071A1
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/183 using context dependencies,...

G10L 15/22 Procedures used during a sp...

Human-augmented, automatic speech recognition engine

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

137 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Human-augmented, automatic speech recognition engine

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

137 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links