Using Utterance Classification in Telephony and Speech Recognition Applications

US 20110307252A1
Filed: 06/15/2010
Published: 12/15/2011
Est. Priority Date: 06/15/2010
Status: Abandoned Application

First Claim

Patent Images

1. In a computing environment, a method performed on at least one processor comprising:

inputting text into a classifier that was trained with speech-recognized acoustic data having associated semantic labels;

classifying the text into one or more of the semantic labels; and

outputting the one or more semantic labels from the classifier.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is the use of utterance classification based methods and other machine learning techniques to provide a telephony application or other voice menu application (e.g., an automotive application) that need not use Context-Free-Grammars to determine a user'"'"'s spoken intent. A classifier receives text from an information retrieval-based speech recognizer and outputs a semantic label corresponding to the likely intent of a user'"'"'s speech. The semantic label is then output, such as for use by a voice menu program in branching between menus. Also described is training, including training the language model from acoustic data without transcriptions, and training the classifier from speech-recognized acoustic data having associated semantic labels.

Citations

20 Claims

1. In a computing environment, a method performed on at least one processor comprising:
- inputting text into a classifier that was trained with speech-recognized acoustic data having associated semantic labels;
  
  classifying the text into one or more of the semantic labels; and
  
  outputting the one or more semantic labels from the classifier.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 inputting the text comprises receiving speech input comprising an utterance and recognizing the utterance as the text.
  - 3. The method of claim 2 wherein the recognizing the utterance comprises inputting the utterance into an information retrieval-based speech recognizer.
  - 4. The method of claim 3 further comprising, training the information retrieval-based speech recognizer with transcribed data, non-transcribed characterized data or non-transcribed, non-characterized data, or any combination of transcribed data, non-transcribed, characterized data or non-transcribed, non-characterized data.
  - 5. The method of claim 1 wherein the semantic label corresponds to a menu of a voice menu system, and further comprising, branching to that menu.
  - 6. The method of claim 1 wherein the classifier outputs a plurality of semantic labels, and further comprising, using the plurality of semantic labels to request a confirmation as to which one of the plurality of semantic labels is correct.
  - 7. The method of claim 1 further comprising, training the classifier with phone-level training data generated from a word-level transcription.
  - 8. The method of claim 1 further comprising, training the classifier with artificial examples entered as text.
  - 9. The method of claim 1 further comprising, training the classifier with transcribed data, non-transcribed, characterized data or non-transcribed, non-characterized data, or any combination of transcribed data, non-transcribed, characterized data or non-transcribed, non-characterized data.

10. In a computing environment, a system comprising, a voice-menu program, the voice menu program coupled to a classifier trained at least in part via machine learning using data associated with semantic labels of a predetermined set of semantic labels, the classifier configured to input text received from a speech recognizer and search a classification model to match at least one semantic label to the text for providing to the voice menu program.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system of claim 10 where the voice menu program corresponds to a telephony application.
  - 12. The system of claim 10 where the voice menu program corresponds to an automotive application.
  - 13. The system of claim 10 wherein the voice menu program changes a menu based upon a semantic label provided by the classifier.
  - 14. The system of claim 10 wherein the classifier provides two or more semantic labels, and wherein the voice menu program prompts for verbal confirmation corresponding to which of the semantic labels is to be used in taking further action.
  - 15. The system of claim 10 wherein the speech recognizer comprises an information retrieval-based speech recognizer having a statistical language model iteratively trained at least in part on labeled training data.
  - 16. The system of claim 10 wherein the speech recognizer or the classifier, or both the speech recognizer and the classifier, operate at a phoneme-level, a word-level, or other sub-unit level, or any combination of a phoneme-level, a word-level, or other sub-unit level.

17. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising, classifying text into a semantic label of a predetermined set of semantic labels, in which the text corresponds to recognized speech, selecting a menu of a voice menu program based upon the semantic label, and changing the voice menu program to the selected menu.
- View Dependent Claims (18, 19, 20)
- - 18. The one or more computer-readable media of claim 17 having further computer-executable instructions comprising, recognizing the text from an utterance via an information retrieval-based speech recognizer.
  - 19. The one or more computer-readable media of claim 17 having further computer-executable instructions comprising, classifying other text into a plurality of the semantic labels, and using the plurality of semantic labels to request a confirmation as to which one of the plurality of semantic labels is correct.
  - 20. The one or more computer-readable media of claim 17 having further computer-executable instructions comprising, training the classifier with phone-level training data generated from a word-level transcription.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Ju, Yun-Cheng, Droppo, James Garnet III

Application Number

US12/815,419
Publication Number

US 20110307252A1
Time in Patent Office

Days
Field of Search
US Class Current

704/232
CPC Class Codes

G10L 15/1822 Parsing for meaning underst...

Using Utterance Classification in Telephony and Speech Recognition Applications

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Using Utterance Classification in Telephony and Speech Recognition Applications

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links