Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling
First Claim
1. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,wherein said method includes the further step of;
processing said speech signal in accordance with a speaker independent model during but prior to completion of said identifying step.
1 Assignment
0 Petitions
Accused Products
Abstract
Speaker recognition is attempted on input speech signals concurrently with provision of input speech signals to a speech recognition system. If a speaker is recognized, a speaker dependent model which has been trained on an enrolled speaker is supplied to the speech recognition system. If not recognized, then a speaker-independent recognition model is used or, alternatively, the new speaker is enrolled. Other speaker specific information such as a special language model, grammar, vocabulary, a dictionary, a list of names, a language and speaker dependent preferences can also be provided to improve the speech recognition function or even configure or customize the speech recognition system or the response of any system such as a computer or network controlled in response thereto. A consistency check in the form of a decision tree is preferably provided to accelerate the speaker recognition process and increase the accuracy thereof. Further training of a model and/or enrollment of additional speakers may be initiated upon completion of speaker recognition and/or adaptively upon each speaker utterance.
-
Citations
30 Claims
-
1. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, and recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words, wherein said method includes the further step of; processing said speech signal in accordance with a speaker independent model during but prior to completion of said identifying step.
-
-
2. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, and recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continuous speech recognition for said plurality of words, wherein the method includes the further step of; processing said speech signal in accordance with a speaker independent model subsequent to completion of said identifying step when said identifying step does not identify an enrolled speaker.
-
-
3. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, and recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words, wherein said stored representation of speech signals includes a plurality of codebooks, each codebook including a plurality of codewords comprising mean and variance values of parameters of clusters of feature vectors computed from frame of speech signals corresponding to an enrolled speaker, wherein said identifying step includes the step of; sampling frames of said input speech signal; computing feature vectors from frames of said input speech signal; comparing parameters of ones of said feature vectors computed in said computing step with said stored mean and variance values to derive a score; and counting the number of feature vectors which correspond to each said codebook in accordance with results of said step of comparing parameters, wherein the method further includes; performing a consistency check of results of said identifying step; and processing said speech signal in accordance with a speaker independent model subsequent to completion of said identifying step when said identifying step does not identify an enrolled speaker.
-
-
4. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, and recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words, wherein said speech processing model corresponds to one of a special language model, grammar, vocabulary, a dictionary, a list of names, a language and speaker dependent preferences which does not correspond to said stored representation of speech signals used for performing said text-independent comparison in said identifying step. - View Dependent Claims (11, 12)
-
-
5. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, and recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words, determining whether to perform speech recognition in one of a speaker-independent mode or a speaker-dependent mode; and selecting said speech processing model to be a speaker-dependent model or a speaker-independent model based on said determining step. - View Dependent Claims (6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, and recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words, wherein if no speaker is identified in said identifying step, performing a step of presenting an enrollment menu for enrolling said speaker.
-
-
21. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, and recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words, wherein said stored representation of speech signals includes a codebook formed from a set of clustered feature vectors corresponding to said speaker, and wherein said speech processing model is a speaker-dependent model formed from information different from information used to form said codebooks.
-
-
22. A method of operating a speech recognition system, said method comprising the steps of:
-
identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words, providing a speech processing model to said speech recognition system in accordance with results of said identifying step, recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words; and using a speaker-independent model for recognizing said plurality of words in said input speech signal when an enrolled speaker is not identified in said identifying step.
-
-
23. An apparatus for performing speech recognition comprising:
-
means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words; means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continues speech recognition; wherein said apparatus further includes; means for determining whether to perform speech recognition in one of a speaker-independent mode or a speaker-dependent mode; and means for selecting said speech processing model to be a speaker-dependent model or a speaker-independent model based on said determining step. - View Dependent Claims (24, 25, 26, 27)
-
-
28. An apparatus for performing speech recognition comprising:
-
means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words; means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continues speech recognition; wherein said apparatus further includes; means for presenting an enrollment menu for enrolling said speaker if said identifying means is unable to identify said speaker.
-
-
29. An apparatus for performing speech recognition comprising:
-
means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words; means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continuous speech recognition; wherein said identifying means identifies said speaker based on a text-independent model containing codebooks each formed from a set of clustered feature vectors corresponding to a respective one of said plurality of speakers, said stored representation of speech signals containing one of said codebooks, and wherein said speech processing model is a speaker-dependent model formed from information different from information used to form said codebooks.
-
-
30. An apparatus for performing speech recognition comprising:
-
means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words; means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continues speech recognition, wherein said stored representation of speech signals includes a plurality of codebooks, each codebook including a plurality of codewords comprising mean and variance values of parameters of clusters of feature vectors computed from frames of speech signals corresponding to an enrolled speaker and wherein said identifying means includes; means for sampling frames of said input speech signal; means for computing feature vectors from frames of said input speech signal; means for comparing parameters of ones of said feature vectors with said stored mean and variance values to derive a score; and means for counting the number of feature vectors which correspond to each said codebook in response to said means for comparing parameters, wherein said apparatus further includes; means for performing a consistency check of results provided by said means for identifying a speaker, wherein said identifying means identifies said speaker based on a text-independent model containing codebooks each formed from a set of clustered feature vectors corresponding to a respective one of said plurality of speakers, said stored representation of speech signals containing one of said codebooks, and wherein said means for performing a consistency check; determines whether a feature vector count for one of said codebooks corresponding to the speaker identified by said identifying means meets predetermined criteria; and verifies the speaker identified by said identifying means if said feature vector count meets said predetermined criteria, and further analyzes said codebooks to determine a correct speaker if said feature vector count does not meet said predetermined criteria.
-
Specification