System and method for user-specific speech recognition

US 8,112,275 B2
Filed: 04/22/2010
Issued: 02/07/2012
Est. Priority Date: 06/03/2002
Status: Expired due to Term

First Claim

Patent Images

1. A method for user-specific speech recognition, comprising:

receiving, at an input device coupled to a computer, a natural language utterance in a current dialog;

comparing, at a speech recognition engine that executes on the computer, voice characteristics associated with the utterance to unique speech characteristics contained in one or more user profiles to determine an identity associated with a user that spoke the utterance;

seeding the speech recognition engine with data in one or more dictionary and phrase tables, wherein the data seeding the speech recognition engine includes prior probabilities or fuzzy possibilities that are dynamically updated based on the determined user identity and a history associated with the current dialog;

determining, at the speech recognition engine, that the utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the utterance having a confidence level that does not meet or exceed a predetermined value;

requesting, by the speech recognition engine, the user to spell the one or more unrecognized or incorrectly recognized words using a phonetic alphabet, wherein the user provides the phonetic alphabet spelling in one or more subsequent natural language utterances; and

looking up, at the speech recognition engine, one or more words in the one or more dictionary and phrase tables that match the phonetic alphabet spelling to learn a pronunciation associated with the one or more unrecognized or incorrectly recognized words.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The systems and methods described herein may recognize natural language utterances that include queries and/or commands and execute the queries and/or commands based on user-specific profiles. The systems and methods described herein may include a complete speech-based information query, retrieval, presentation and command environment that makes significant use of context, prior information, domain knowledge, and the user-specific profiles to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created and tailored to specific users. For example, the systems and methods described herein may create, store, and use extensive personal profile information for different users, thereby improving the reliability of determining the context and presenting the results that the specific users may expect for a particular question or command.

Citations

22 Claims

1. A method for user-specific speech recognition, comprising:
- receiving, at an input device coupled to a computer, a natural language utterance in a current dialog;
  
  comparing, at a speech recognition engine that executes on the computer, voice characteristics associated with the utterance to unique speech characteristics contained in one or more user profiles to determine an identity associated with a user that spoke the utterance;
  
  seeding the speech recognition engine with data in one or more dictionary and phrase tables, wherein the data seeding the speech recognition engine includes prior probabilities or fuzzy possibilities that are dynamically updated based on the determined user identity and a history associated with the current dialog;
  
  determining, at the speech recognition engine, that the utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the utterance having a confidence level that does not meet or exceed a predetermined value;
  
  requesting, by the speech recognition engine, the user to spell the one or more unrecognized or incorrectly recognized words using a phonetic alphabet, wherein the user provides the phonetic alphabet spelling in one or more subsequent natural language utterances; and
  
  looking up, at the speech recognition engine, one or more words in the one or more dictionary and phrase tables that match the phonetic alphabet spelling to learn a pronunciation associated with the one or more unrecognized or incorrectly recognized words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method recited in claim 1, wherein the phonetic alphabet includes default words or names to represent alphabetic letters.
  - 3. The method recited in claim 2, further comprising modifying the default words or names in the phonetic alphabet in response to the user providing one or more individualized words or names to represent the alphabetic letters.
  - 4. The method recited in claim 1, further comprising providing an interactive interface to train the user on how the speech recognition engine interpreted the utterance.
  - 5. The method recited in claim 4, wherein the interactive interface presents an audible or visual representation associated with the utterance recognition to train the user on how the speech recognition engine interpreted the utterance.
  - 6. The method recited in claim 1, further comprising using the determined user identity to verify that the user that spoke the utterance satisfies one or more security measures associated with an application relating to the utterance or the current dialog.
  - 7. The method recited in claim 1, further comprising adding the pronunciation associated with the one or more unrecognized or incorrectly recognized words to one or more entries in the one or more dictionary and phrase tables that correspond to the one or more words that match the phonetic alphabet spelling.
  - 8. The method recited in claim 7, further comprising associating the pronunciation and the one or more words that match the phonetic alphabet spelling with the determined user identity and the current dialog history.
  - 9. The method recited in claim 8, further comprising:
    - adding tags associated with the determined user identity to any words or phrases recognized in or learned from the utterance;
      
      processing one or more subsequent natural language utterances in a subsequent dialog, wherein processing the one or more subsequent utterances includes tagging any words or phrases recognized in or learned from the subsequent utterances with information that identifies the subsequent dialog; and
      
      overlapping processing associated with the current dialog and the subsequent dialog using the tags associated with the determined user identity and the information that identifies the subsequent dialog.
  - 10. The method recited in claim 9, wherein overlapping the processing associated with the current dialog and the subsequent dialog includes:
    - interrupting the current dialog in response to an utterance associated with the current dialog containing a dismissal word; and
      
      starting the subsequent dialog in response to an utterance containing a generic word or a specific name tied to a system personality.
  - 11. The method recited in claim 10, wherein overlapping the processing associated with the current dialog and the subsequent dialog further includes:
    - interrupting or ending the subsequent dialog in response to an utterance associated with the subsequent dialog containing the dismissal word; and
      
      resuming the current dialog in response to an utterance containing the generic word or the specific name tied to the system personality.

12. A system for user-specific speech recognition, wherein the system comprises a computer device having a speech recognition engine configured to:
- compare voice characteristics associated with a natural language utterance received in a current dialog to unique speech characteristics contained in one or more user profiles to determine an identity associated with a user that spoke the utterance;
  
  receive seeding data contained in one or more dictionary and phrase tables, wherein the seeding data includes prior probabilities or fuzzy possibilities that are dynamically updated based on the determined user identity and a history associated with the current dialog;
  
  determine that the utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the utterance having a confidence level that does not meet or exceed a predetermined value;
  
  request the user to spell the one or more unrecognized or incorrectly recognized words using a phonetic alphabet, wherein the user provides the phonetic alphabet spelling in one or more subsequent natural language utterances; and
  
  look up one or more words in the one or more dictionary and phrase tables that match the phonetic alphabet spelling to learn a pronunciation associated with the one or more unrecognized or incorrectly recognized words.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The system recited in claim 12, wherein the phonetic alphabet includes default words or names to represent alphabetic letters.
  - 14. The system recited in claim 13, wherein the speech recognition engine is further configured to modify the default words or names in the phonetic alphabet in response to the user providing one or more individualized words or names to represent the alphabetic letters.
  - 15. The system recited in claim 12, further comprising an interactive interface configured to train the user on how the speech recognition engine interpreted the utterance.
  - 16. The system recited in claim 15, wherein the interactive interface is further configured to present an audible or visual representation associated with the utterance recognition to train the user on how the speech recognition engine interpreted the utterance.
  - 17. The system recited in claim 12, wherein the speech recognition engine is further configured to use the determined user identity to verify that the user that spoke the utterance satisfies one or more security measures associated with an application relating to the utterance or the current dialog.
  - 18. The system recited in claim 12, wherein the speech recognition engine is further configured to add the pronunciation associated with the one or more unrecognized or incorrectly recognized words to one or more entries in the one or more dictionary and phrase tables that correspond to the one or more words that match the phonetic alphabet spelling.
  - 19. The system recited in claim 18, wherein the speech recognition engine is further configured to associate the pronunciation and the one or more words that match the phonetic alphabet spelling with the determined user identity and the current dialog history.
  - 20. The system recited in claim 19, wherein the speech recognition engine is further configured to:
    - add tags associated with the determined user identity to any words or phrases recognized in or learned from the utterance;
      
      tag any words or phrases recognized in or learned from a subsequent dialog with information that identifies the subsequent dialog to process one or more subsequent natural language utterances associated with the subsequent dialog; and
      
      overlap processing associated with the current dialog and the subsequent dialog using the tags associated with the determined user identity and the information that identifies the subsequent dialog.
  - 21. The system recited in claim 20, wherein to overlap the processing associated with the current dialog and the subsequent dialog, the system further comprises an event manager configured to:
    - interrupt the current dialog in response to the speech recognition engine recognizing a dismissal word in the current dialog; and
      
      start the subsequent dialog in response to the speech recognition engine recognizing a generic word or a specific name tied to a system personality.
  - 22. The system recited in claim 21, wherein to overlap the processing associated with the current dialog and the subsequent dialog, the event manager is further configured to:
    - interrupt or end the subsequent dialog in response to the speech recognition engine recognizing the dismissal word in the subsequent dialog; and
      
      resume the current dialog in response to the speech recognition engine recognizing the generic word or the specific name tied to the system personality.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dialect, LLC
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Kennewick, Robert A., Locke, David, Kennewick, Michael R. Sr., Kennewick, Michael R. Jr., Kennewick, Richard, Freeman, Tom
Primary Examiner(s)
Lerner, Martin

Application Number

US12/765,733
Publication Number

US 20100204994A1
Time in Patent Office

656 Days
Field of Search

704/240, 704/244, 704/250, 704/257, 704/246, 704/275
US Class Current

704/240
CPC Class Codes

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Y10S 707/99933   Query processing, i.e. sear...

System and method for user-specific speech recognition

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for user-specific speech recognition

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links