Method for recognizing non-standard and standard speech by speaker independent and speaker dependent word models
First Claim
Patent Images
1. In a speech recognition system comprising:
- an incoming audio signal representing utterances of a user;
a stored set of first word models derived from utterances of a plurality of speakers; and
means for identifying a word in the utterances of a user upon matching portions of said audio signal with one of said stored first word models, a method of enhancing recognition of speech of said user comprising;
ascertaining a current context of the utterances of the user;
providing for said user a stored set of second word models, said set of second word models derived from words spoken by said user, said first word models and said second word models differing from each other;
attempting to identify words in the utterances of said user to find a match in the current context by comparing portions of said audio signal with one of a word model among said first word models and a word model among said second word models associated with said user, the attempting including determining a probability of whether the match exceeds a threshold; and
if the probability of the match fails to exceed the threshold, informing that the words fail to match any of the words acceptable in the current context and thereafter modifying, based on the words in the utterances of said user, the word model among the second word models associated with the user and without modifying the stored set of first word models.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for speech recognition includes a speaker-independent set of stored word representations derived from speech of many users deemed to be typical speakers and for use by all users, and may further include speaker-dependent sets of stored word representations specific to each user. Utterances from a user which match stored words in either set according to the ordering rules are reported as words.
-
Citations
10 Claims
-
1. In a speech recognition system comprising:
-
an incoming audio signal representing utterances of a user;
a stored set of first word models derived from utterances of a plurality of speakers; and
means for identifying a word in the utterances of a user upon matching portions of said audio signal with one of said stored first word models, a method of enhancing recognition of speech of said user comprising;
ascertaining a current context of the utterances of the user;
providing for said user a stored set of second word models, said set of second word models derived from words spoken by said user, said first word models and said second word models differing from each other;
attempting to identify words in the utterances of said user to find a match in the current context by comparing portions of said audio signal with one of a word model among said first word models and a word model among said second word models associated with said user, the attempting including determining a probability of whether the match exceeds a threshold; and
if the probability of the match fails to exceed the threshold, informing that the words fail to match any of the words acceptable in the current context and thereafter modifying, based on the words in the utterances of said user, the word model among the second word models associated with the user and without modifying the stored set of first word models. - View Dependent Claims (2, 3, 4, 5)
inviting a user upon first use of the speech recognition system to speak training words for deriving said set of second word models;
deriving said set of second models from said training words; and
storing said set of second word models.
-
-
3. The method according to claim 2 wherein said set of second word models is stored in a separate memory location from said set of first word models.
-
4. The method according to claim 1 further including:
-
inviting a user to speak training utterances of a word upon a predetermined number of failures to identify the word among said first word models when no model for the word is present in said second models;
deriving a word model from said training utterances; and
storing the derived word model in said set of second word models.
-
-
5. The method according to claim 4 wherein said set of second word models is stored in a separate memory location from said set of first word models.
-
6. A method of enhancing speech recognition comprising:
-
providing a set of user-independent word models derived from utterances of a plurality of speakers;
providing a set of user-dependent word models for ones of a plurality of users each of the user-dependent word models being derived from utterances of an associated one of said users, said user-independent word models and said user-dependent word models differing from each other;
ascertaining a current context of the utterances of the user;
attempting to match an utterance from one of said users to one of said user-independent word models to find a possible match in the current context; and
attempting to match another utterance from said one of said users to one of said user-dependent word models to find a further match in the current context, determining probabilities of whether the possible match and the further match exceed a threshold; and
if the probabilities of the possible match and the further match fail to exceed the threshold, informing that the words fail to match any of the words acceptable in the current context and thereafter modifying, based on the words in the utterances of said user, the user-dependent word models and without modifying the provided set of user-independent word models. - View Dependent Claims (7, 8, 9, 10)
inviting a new user to speak training words for deriving a set of user-dependent word models;
deriving said set of user-dependent models from said training words; and
storing said set of user-dependent word models.
-
-
8. The method according to claim 7 wherein said user-dependent word models are stored in a separate memory location from said user-independent word models.
-
9. The method according to claim 6 further including:
-
inviting a new user to speak training utterances of a word upon a predetermined number of failures to identify the word among said user-independent word models when no model for the word is present in said user-dependent models;
deriving a word model from said training utterances; and
storing the derived word model in said set of user-dependent word models.
-
-
10. The method according to claim 9 wherein said user-dependent word models are stored in a separate memory location from said user-independent word models.
Specification