Pronunciation learning from user correction

US 9,640,175 B2
Filed: 10/07/2011
Issued: 05/02/2017
Est. Priority Date: 10/07/2011
Status: Active Grant

First Claim

Patent Images

1. A method for updating a custom lexicon used by a speech recognition engine that comprises part of a speech interface, comprising:

obtaining a speech signal by the speech interface when a user speaks a name of a particular item for the purpose of selecting the particular item from among a finite set of items;

presenting the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that a phonetic description of the speech signal is not recognized by the speech recognition engine;

after the user has selected the particular item via the means for selecting, storing the phonetic description of the speech signal in association with a text description of the particular item in the custom lexicon, the custom lexicon comprising a user-specific custom lexicon that is used to recognize speech of the user only and a system custom lexicon that is used to recognize speech of all users of a system, the storing comprising;

determining if the particular item is of a particular type,automatically storing the phonetic description of the speech signal only in the user-specific custom lexicon in response to determining that the particular item is of the particular type, andautomatically storing the phonetic description of the speech signal only in the system custom lexicon in response to determining that the particular item is not of the particular type; and

elevating the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon in response to determining that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are described for adding entries to a custom lexicon used by a speech recognition engine of a speech interface in response to user interaction with the speech interface. In one embodiment, a speech signal is obtained when the user speaks a name of a particular item to be selected from among a finite set of items. If a phonetic description of the speech signal is not recognized by the speech recognition engine, then the user is presented with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item. After the user has selected the particular item via the means for selecting, the phonetic description of the speech signal is stored in association with a text description of the particular item in the custom lexicon.

41 Citations

View as Search Results

20 Claims

1. A method for updating a custom lexicon used by a speech recognition engine that comprises part of a speech interface, comprising:
- obtaining a speech signal by the speech interface when a user speaks a name of a particular item for the purpose of selecting the particular item from among a finite set of items;
  
  presenting the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that a phonetic description of the speech signal is not recognized by the speech recognition engine;
  
  after the user has selected the particular item via the means for selecting, storing the phonetic description of the speech signal in association with a text description of the particular item in the custom lexicon, the custom lexicon comprising a user-specific custom lexicon that is used to recognize speech of the user only and a system custom lexicon that is used to recognize speech of all users of a system, the storing comprising;
  
  determining if the particular item is of a particular type,automatically storing the phonetic description of the speech signal only in the user-specific custom lexicon in response to determining that the particular item is of the particular type, andautomatically storing the phonetic description of the speech signal only in the system custom lexicon in response to determining that the particular item is not of the particular type; and
  
  elevating the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon in response to determining that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the speech interface is implemented on a first device, the method further comprising:
    - allowing a second device to access at least the custom lexicon for the purposes of implementing a speech interface on the second device.
  - 3. The method of claim 1, further comprising:
    - storing multiple phonetic descriptions in association with the text description of the particular item in the custom lexicon.
  - 4. The method of claim 3, wherein storing multiple phonetic descriptions in association with the text description of the particular item in the custom lexicon comprises:
    - storing only up to a limited number of phonetic descriptions in association with the text description of the particular item in the custom lexicon.
  - 5. The method of claim 1, wherein presenting the user with the means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item comprises:
    - applying a syllable-based statistical language model to the speech signal to identify syllables present in the speech signal;
      
      identifying a subset of the items in the finite set of items in a classifier based on the identified syllables; and
      
      presenting the subset of the items to the user as candidates for selection.
  - 6. The method of claim 5, further comprising:
    - updating a classification model used by the classifier based on the selection by the user of the particular item via the means for selecting.
  - 7. The method of claim 1, further comprising:
    - providing the phonetic description of the speech signal to a text-to-speech converter to produce a pronunciation; and
      
      prompting the user to confirm the pronunciation produced by the text-to-speech converter prior to storing the phonetic description of the speech signal in association with the text description of the particular item in the custom lexicon.

8. A system, comprising:
- a speech recognition engine that is configured to generate a phonetic description of a speech signal obtained when a user speaks a name of a particular item into a speech interface for the purpose of selecting the particular item from among a finite set of items and to match the phonetic description of the speech signal to one of a plurality of phonetic descriptions included in a system lexicon or a custom lexicon;
  
  a dialog manager that is configured to present the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that the speech recognition engine has failed to match the phonetic description of the speech signal to any of the phonetic descriptions included in the system lexicon or the custom lexicon;
  
  a learning engine that is configured to store the phonetic description of the speech signal in association with a text description of the particular item selected by the user via the means for selecting in the custom lexicon, the custom lexicon comprising a user-specific custom lexicon and a system custom lexicon, the learning engine being configured to automatically store the phonetic description of the speech signal only in the user-specific custom lexicon based on a determination that the particular item is of a particular type and to automatically store the phonetic description of the speech signal only in the system custom lexicon based on a determination that the particular item is not of a particular type, and the learning engine is further configured to elevate the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon based on a determination that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item; and
  
  a network accessible storage system that stores the custom lexicon and makes the custom lexicon available to a plurality of network-connected devices for performing speech recognition functions.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The system of claim 8, wherein the learning engine is configured to store multiple phonetic descriptions in association with the text description of the particular item selected by the user in the custom lexicon.
  - 10. The system of claim 8, wherein the learning engine is configured to store only up to a limited number of phonetic descriptions in association with the text description of the particular item selected by the user in the custom lexicon.
  - 11. The system of claim 8, wherein the learning engine is configured to store the phonetic description of the speech signal in association with the text description of the particular item selected by the user in a user-specific custom lexicon that is used to recognize speech of the user only.
  - 12. The system of claim 8, wherein the learning engine is further configured to store the phonetic description of the speech signal in association with the text description of the particular item selected by the-user in a second custom lexicon associated with a second user that is associated with the user.
  - 13. The system of claim 8, wherein the speech recognition engine is further configured to apply a syllable-based statistical language model to the speech signal to identify syllables present in the speech signal and to use a classifier to identify a subset of the items in the finite set of items based on the identified syllables;
    - andwherein the dialog manager is configured to present the subset of the items to the user as candidates for selection.
  - 14. The system of claim 13, wherein the speech recognition engine is further configured to update a classification model used by the classifier based on the selection by the user of the particular item via the means for selecting.
  - 15. The system of claim 12, further comprising:
    - a text-to-speech converter that is configured to produce a pronunciation based on the phonetic description of the speech signal;
      
      wherein the dialog manager is further configure to prompt the user to confirm the pronunciation produced by the text-to-speech converter prior to storing the phonetic description of the speech signal in association with the text description of the particular item selected by the user in the custom lexicon.

16. A computer program product comprising a non-transitory computer-readable medium having computer program logic recorded thereon for enabling a processing unit to update a custom lexicon dictionary used by a speech recognition engine that comprises part of a speech interface to an application, the computer program logic comprising:
- first means for enabling the processing unit to obtain a speech signal when a user speaks a name of a particular item into the speech interface for the purpose of selecting the particular item from among a finite set of items;
  
  second means for enabling the processing unit to obtain a text description of the particular item from the speech recognition engine based upon recognition of a phonetic description of the speech signal by the speech recognition engine; and
  
  third means for enabling the processing unit to store the phonetic description of the speech signal in association with the text description of the particular item in a custom lexicon in response to determining that a measure of confidence with which the phonetic description of the speech signal has been recognized by the speech recognition engine is below a predefined threshold, the custom lexicon comprising a user-specific custom lexicon and a system custom lexicon, the third means further enables the processing unit to automatically store the phonetic description of the speech signal only in the user-specific custom lexicon based on a determination that the particular item is of a particular type, to automatically store the phonetic description of the speech signal only in the system custom lexicon based on a determination that the particular item is not of a particular type, and to elevate the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon based on a determination that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer program product of claim 16, further comprising:
    - fourth means for enabling the processing unit to present the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the particular item.
  - 18. The computer program product of claim 17, further comprising:
    - fifth means for enabling the processing unit to prompt the user to make a selection via the means for selecting.
  - 19. The computer program product of claim 18, further comprising:
    - sixth means for enabling the processing unit to update a classification model based on the selection by the user of the particular item via the means for selecting.
  - 20. The computer program product of claim 17, wherein the third means further enables the processing unit to store the phonetic description of the speech signal in association with the text description of the particular item selected by the user in a second custom lexicon associated with a second user that is associated with the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Liu, Wei-Ting Frank, Lovitt, Andrew, Tomko, Stefanie, Ju, Yun-Cheng
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US13/268,281
Publication Number

US 20130090921A1
Time in Patent Office

2,034 Days
Field of Search

704 10, 704275, 704249, 704 2, 704251, 704243, 704231, 704235, 704244, 704255, 704 8, 704260, 704250, 704 3, 704246, 704204, 704257, 704254, 345156, 455418, 379 8801, 340994, 434157
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 2015/0638   Interactive procedures

G10L 2015/221   Announcement of recognition...

Pronunciation learning from user correction

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Pronunciation learning from user correction

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links