PRONUNCIATION LEARNING FROM USER CORRECTION

US 20130090921A1
Filed: 10/07/2011
Published: 04/11/2013
Est. Priority Date: 10/07/2011
Status: Active Grant

First Claim

Patent Images

1. A method for updating a custom lexicon used by a speech recognition engine that comprises part of a speech interface, comprising:

obtaining a speech signal by the speech interface when a user speaks a name of a particular item for the purpose of selecting the particular item from among a finite set of items;

presenting the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that a phonetic description of the speech signal is not recognized by the speech recognition engine; and

after the user has selected the particular item via the means for selecting, storing the phonetic description of the speech signal in association with a text description of the particular item in the custom lexicon.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are described for adding entries to a custom lexicon used by a speech recognition engine of a speech interface in response to user interaction with the speech interface. In one embodiment, a speech signal is obtained when the user speaks a name of a particular item to be selected from among a finite set of items. If a phonetic description of the speech signal is not recognized by the speech recognition engine, then the user is presented with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item. After the user has selected the particular item via the means for selecting, the phonetic description of the speech signal is stored in association with a text description of the particular item in the custom lexicon.

274 Citations

20 Claims

1. A method for updating a custom lexicon used by a speech recognition engine that comprises part of a speech interface, comprising:
- obtaining a speech signal by the speech interface when a user speaks a name of a particular item for the purpose of selecting the particular item from among a finite set of items;
  
  presenting the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that a phonetic description of the speech signal is not recognized by the speech recognition engine; and
  
  after the user has selected the particular item via the means for selecting, storing the phonetic description of the speech signal in association with a text description of the particular item in the custom lexicon.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the speech interface is implemented on a first device, the method further comprising:
    - allowing a second device to access to at least the custom lexicon for the purposes of implementing a speech interface on the second device.
  - 3. The method of claim 1, further comprising:
    - storing multiple phonetic descriptions in association with the text description of the particular item in the custom lexicon.
  - 4. The method of claim 3, wherein storing multiple phonetic descriptions in association with the text description of the particular item in the custom lexicon comprises:
    - storing only up to a limited number of phonetic descriptions in association with the text description of the particular item in the custom lexicon.
  - 5. The method of claim 1, wherein storing the phonetic description of the speech signal in association with the text description of the particular item in the custom lexicon comprises storing the phonetic description of the speech signal in association with the text description of the particular item in at least one of:
    - a user-specific custom lexicon that is used to recognize speech of the user only;
      
      ora system custom lexicon that is used to recognize speech of all users of a system.
  - 6. The method of claim 1, further comprising:
    - storing the phonetic description of the speech signal in association with the text description of the particular item in a second custom lexicon associated with a second user that is associated with the first user.
  - 7. The method of claim 1, wherein presenting the user with the means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item comprises:
    - applying a syllable-based statistical language model to the speech signal to identify syllables present in the speech signal;
      
      identifying a subset of the items in the finite set of items in a classifier based on the identified syllables; and
      
      presenting the subset of the items to the user as candidates for selection.
  - 8. The method of claim 7, further comprising:
    - updating a classification model used by the classifier based on the selection by the user of the particular item via the means for selecting.
  - 9. The method of claim 1, further comprising:
    - providing the phonetic description of the speech signal to a text-to-speech converter to produce a pronunciation; and
      
      prompting the user to confirm the pronunciation produced by the text-to-speech converter prior to storing the phonetic description of the speech signal in association with the text description of the particular item in the custom lexicon.

10. A system, comprising:
- a speech recognition engine that is configured to generate a phonetic description of a speech signal obtained when a user speaks a name of a particular item into a speech interface for the purpose of selecting the particular item from among a finite set of items and to match the phonetic description of the speech signal to one of a plurality of phonetic descriptions included in a system lexicon or a custom lexicon;
  
  a dialog manager that is configured to present the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that the speech recognition engine has failed to match the phonetic description of the speech signal to any of the phonetic descriptions included in the system lexicon or the custom lexicon; and
  
  a learning engine that is configured to store the phonetic description of the speech signal in association with a text description of the particular item selected by the user via the means for selecting in the custom lexicon.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The system of claim 10, further comprising:
    - a network-accessible storage system that stores the custom lexicon and makes the custom lexicon available to a plurality of network-connected devices for performing speech recognition functions.
  - 12. The system of claim 10, wherein the learning engine is configured to store multiple phonetic descriptions in association with the text description of the particular item selected by the user in the custom lexicon.
  - 13. The system of claim 10, wherein the learning engine is configured to store only up to a limited number of phonetic descriptions in association with the text description of the particular item selected by the user in the custom lexicon.
  - 14. The system of claim 10, wherein the learning engine is configured to store the phonetic description of the speech signal in association with the text description of the particular item selected by the user in a user-specific custom lexicon that is used to recognize speech of the user only.
  - 15. The system of claim 10, wherein the learning engine is configured to store the phonetic description of the speech signal in association with the text description of the particular item selected by the user in a system custom lexicon that is used to recognize speech of all users of the system.
  - 16. The system of claim 10, wherein the learning engine is further configured to store the phonetic description of the speech signal in association with the text description of the particular item selected by the user in a second custom lexicon associated with a second user that is associated with the first user.
  - 17. The system of claim 10, wherein the speech recognition engine is further configured to apply a syllable-based statistical language model to the speech signal to identify syllables present in the speech signal and to use a classifier to identify a subset of the items in the finite set of items based on the identified syllables;
    - andwherein the dialog manager is configured to present the subset of the items to the user as candidates for selection.
  - 18. The system of claim 17, wherein the speech recognition engine is further configured to update a classification model used by the classifier based on the selection by the user of the particular item via the means for selecting.
  - 19. The system of claim 16, further comprising:
    - a text-to-speech converter that is configured to produce a pronunciation based on the phonetic description of the speech signal;
      
      wherein the dialog manager is further configure to prompt the user to confirm the pronunciation produced by the text-to-speech converter prior to storing the phonetic description of the speech signal in association with the text description of the particular item selected by the user in the custom lexicon.

20. A computer program product comprising a non-transitory computer-readable medium having computer program logic recorded thereon for enabling a processing unit to update a custom lexicon dictionary used by a speech recognition engine that comprises part of a speech interface to an application, the computer program logic comprising:
- first means for enabling the processing unit to obtain a speech signal when a user speaks a name of a particular item into the speech interface for the purpose of selecting the particular item from among a finite set of items;
  
  second means for enabling the processing unit to obtain a text description of the particular item from the speech recognition engine based upon recognition of a phonetic description of the speech signal by the speech recognition engine; and
  
  third means for enabling the processing unit to store the phonetic description of the speech signal in association with the text description of the particular item in a custom lexicon in response to determining that a measure of confidence with which the phonetic description of the speech signal has been recognized by the speech recognition engine is below a predefined threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Liu, Wei-Ting Frank, Lovitt, Andrew, Tomko, Stefanie, Ju, Yun-Cheng

Granted Patent

US 9,640,175 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/10
CPC Class Codes

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 2015/0638   Interactive procedures

G10L 2015/221   Announcement of recognition...

PRONUNCIATION LEARNING FROM USER CORRECTION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

274 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

PRONUNCIATION LEARNING FROM USER CORRECTION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

274 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others