Pronunciation learning from user correction
First Claim
1. A method for updating a custom lexicon used by a speech recognition engine that comprises part of a speech interface, comprising:
- obtaining a speech signal by the speech interface when a user speaks a name of a particular item for the purpose of selecting the particular item from among a finite set of items;
presenting the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that a phonetic description of the speech signal is not recognized by the speech recognition engine;
after the user has selected the particular item via the means for selecting, storing the phonetic description of the speech signal in association with a text description of the particular item in the custom lexicon, the custom lexicon comprising a user-specific custom lexicon that is used to recognize speech of the user only and a system custom lexicon that is used to recognize speech of all users of a system, the storing comprising;
determining if the particular item is of a particular type,automatically storing the phonetic description of the speech signal only in the user-specific custom lexicon in response to determining that the particular item is of the particular type, andautomatically storing the phonetic description of the speech signal only in the system custom lexicon in response to determining that the particular item is not of the particular type; and
elevating the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon in response to determining that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described for adding entries to a custom lexicon used by a speech recognition engine of a speech interface in response to user interaction with the speech interface. In one embodiment, a speech signal is obtained when the user speaks a name of a particular item to be selected from among a finite set of items. If a phonetic description of the speech signal is not recognized by the speech recognition engine, then the user is presented with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item. After the user has selected the particular item via the means for selecting, the phonetic description of the speech signal is stored in association with a text description of the particular item in the custom lexicon.
41 Citations
20 Claims
-
1. A method for updating a custom lexicon used by a speech recognition engine that comprises part of a speech interface, comprising:
-
obtaining a speech signal by the speech interface when a user speaks a name of a particular item for the purpose of selecting the particular item from among a finite set of items; presenting the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that a phonetic description of the speech signal is not recognized by the speech recognition engine; after the user has selected the particular item via the means for selecting, storing the phonetic description of the speech signal in association with a text description of the particular item in the custom lexicon, the custom lexicon comprising a user-specific custom lexicon that is used to recognize speech of the user only and a system custom lexicon that is used to recognize speech of all users of a system, the storing comprising; determining if the particular item is of a particular type, automatically storing the phonetic description of the speech signal only in the user-specific custom lexicon in response to determining that the particular item is of the particular type, and automatically storing the phonetic description of the speech signal only in the system custom lexicon in response to determining that the particular item is not of the particular type; and elevating the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon in response to determining that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system, comprising:
-
a speech recognition engine that is configured to generate a phonetic description of a speech signal obtained when a user speaks a name of a particular item into a speech interface for the purpose of selecting the particular item from among a finite set of items and to match the phonetic description of the speech signal to one of a plurality of phonetic descriptions included in a system lexicon or a custom lexicon; a dialog manager that is configured to present the user with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item in response to determining that the speech recognition engine has failed to match the phonetic description of the speech signal to any of the phonetic descriptions included in the system lexicon or the custom lexicon; a learning engine that is configured to store the phonetic description of the speech signal in association with a text description of the particular item selected by the user via the means for selecting in the custom lexicon, the custom lexicon comprising a user-specific custom lexicon and a system custom lexicon, the learning engine being configured to automatically store the phonetic description of the speech signal only in the user-specific custom lexicon based on a determination that the particular item is of a particular type and to automatically store the phonetic description of the speech signal only in the system custom lexicon based on a determination that the particular item is not of a particular type, and the learning engine is further configured to elevate the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon based on a determination that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item; and a network accessible storage system that stores the custom lexicon and makes the custom lexicon available to a plurality of network-connected devices for performing speech recognition functions. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product comprising a non-transitory computer-readable medium having computer program logic recorded thereon for enabling a processing unit to update a custom lexicon dictionary used by a speech recognition engine that comprises part of a speech interface to an application, the computer program logic comprising:
-
first means for enabling the processing unit to obtain a speech signal when a user speaks a name of a particular item into the speech interface for the purpose of selecting the particular item from among a finite set of items; second means for enabling the processing unit to obtain a text description of the particular item from the speech recognition engine based upon recognition of a phonetic description of the speech signal by the speech recognition engine; and third means for enabling the processing unit to store the phonetic description of the speech signal in association with the text description of the particular item in a custom lexicon in response to determining that a measure of confidence with which the phonetic description of the speech signal has been recognized by the speech recognition engine is below a predefined threshold, the custom lexicon comprising a user-specific custom lexicon and a system custom lexicon, the third means further enables the processing unit to automatically store the phonetic description of the speech signal only in the user-specific custom lexicon based on a determination that the particular item is of a particular type, to automatically store the phonetic description of the speech signal only in the system custom lexicon based on a determination that the particular item is not of a particular type, and to elevate the phonetic description of the speech signal stored in the user-specific custom lexicon to the system custom lexicon based on a determination that a certain number of user-specific custom lexicons all include a same or similar pronunciation for the particular item. - View Dependent Claims (17, 18, 19, 20)
-
Specification