Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling

US 6,088,669 A
Filed: 01/28/1997
Issued: 07/11/2000
Est. Priority Date: 01/28/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method of operating a speech recognition system, said method comprising the steps of:

identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,wherein said method includes the further step of;

processing said speech signal in accordance with a speaker independent model during but prior to completion of said identifying step.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speaker recognition is attempted on input speech signals concurrently with provision of input speech signals to a speech recognition system. If a speaker is recognized, a speaker dependent model which has been trained on an enrolled speaker is supplied to the speech recognition system. If not recognized, then a speaker-independent recognition model is used or, alternatively, the new speaker is enrolled. Other speaker specific information such as a special language model, grammar, vocabulary, a dictionary, a list of names, a language and speaker dependent preferences can also be provided to improve the speech recognition function or even configure or customize the speech recognition system or the response of any system such as a computer or network controlled in response thereto. A consistency check in the form of a decision tree is preferably provided to accelerate the speaker recognition process and increase the accuracy thereof. Further training of a model and/or enrollment of additional speakers may be initiated upon completion of speaker recognition and/or adaptively upon each speaker utterance.

Citations

30 Claims

1. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,wherein said method includes the further step of;
  
  processing said speech signal in accordance with a speaker independent model during but prior to completion of said identifying step.

2. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continuous speech recognition for said plurality of words,wherein the method includes the further step of;
  
  processing said speech signal in accordance with a speaker independent model subsequent to completion of said identifying step when said identifying step does not identify an enrolled speaker.

3. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,wherein said stored representation of speech signals includes a plurality of codebooks, each codebook including a plurality of codewords comprising mean and variance values of parameters of clusters of feature vectors computed from frame of speech signals corresponding to an enrolled speaker, wherein said identifying step includes the step of;
  
  sampling frames of said input speech signal;
  
  computing feature vectors from frames of said input speech signal;
  
  comparing parameters of ones of said feature vectors computed in said computing step with said stored mean and variance values to derive a score; and
  
  counting the number of feature vectors which correspond to each said codebook in accordance with results of said step of comparing parameters,wherein the method further includes;
  
  performing a consistency check of results of said identifying step; and
  
  processing said speech signal in accordance with a speaker independent model subsequent to completion of said identifying step when said identifying step does not identify an enrolled speaker.

4. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,wherein said speech processing model corresponds to one of a special language model, grammar, vocabulary, a dictionary, a list of names, a language and speaker dependent preferences which does not correspond to said stored representation of speech signals used for performing said text-independent comparison in said identifying step.
- View Dependent Claims (11, 12)
- - 11. A method as recited in claim 4, including the further step ofproviding results of said identifying step to said speaker.
  - 12. A method as recited in claim 11, including the further step ofinitiating a process for creating or training a codebook in response to said step of providing results of said identifying step to said user.

5. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,determining whether to perform speech recognition in one of a speaker-independent mode or a speaker-dependent mode; and
  
  selecting said speech processing model to be a speaker-dependent model or a speaker-independent model based on said determining step.
- View Dependent Claims (6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19)
- - 6. A method as recited in claim 5, wherein said stored representation of speech signals includes a plurality of codebooks, each codebook including a plurality of codewords comprising mean and variance values of parameters of clusters of feature vectors computed from frames of speech signals corresponding to an enrolled speaker, wherein said identifying step includes the steps ofsampling frames of said input speech signal,computing feature vectors from frames of said input speech signal,comparing parameters of ones of said feature vectors computed in said computing step with said stored mean and variance values to derive a score, andcounting the number of feature vectors which correspond to each said codebook in accordance with results of said step of comparing parameters.
  - 7. A method as recited in claim 6, including the further step ofperforming a consistency check of results of said identifying step.
  - 8. A method as recited in claim 7, including the further step ofcomparing results of said counting step.
  - 9. A method as recited in claim 8, including the further step ofdetermining whether or not a distance of a parameter of a feature vector from a mean value of a parameter of a codeword is greater than a distance corresponding to a variance of a corresponding parameter of said codeword for each of a plurality of feature vectors, anddetecting a rate of occurrence of a result of said determining step.
  - 10. A method as recited in claim 7, including the further step ofdetermining whether or not a distance of a parameter of a feature vector from a mean value of a parameter of a codeword is greater than a distance corresponding to a variance of a corresponding parameter of said codeword for each of a plurality of feature vectors, anddetecting a rate of occurrence of a result of said determining step.
  - 13. A method as recited in claim 7, wherein identifying step includes identifying said speaker based on codebooks each formed from a set of clustered feature vectors corresponding to a respective one of said plurality of speakers, said stored representation of speech signals including one of said codebooks, andwherein said consistency check includes:
    - determining whether a feature vector count for one of said codebooks corresponding to the speaker identified in said identifying step meets predetermined criteria; and
      
      verifying the speaker identified in said identifying step if said feature vector count meets said predetermined criteria, and further analyzing said codebooks to determine a correct speaker if said feature vector count does not meet said predetermined criteria.
  - 14. A method as recited in claim 5, wherein said identifying step includes a template matching process.
  - 15. A method as recited in claim 5, including the further step ofprocessing said speech signal in accordance with a speaker dependent model subsequent to completion of said identifying step.
  - 16. A method as recited in claim 5, including the further step ofproviding results of said identifying step to said speaker.
  - 17. A method as recited in claim 16, including the further step ofinitiating a process for creating or training a codebook in response to said step of providing results of said identifying step to said user.
  - 18. A method as recited in claim 5, wherein said speech processing model is a speaker-dependent model, and wherein said method further comprises:
    - providing a speaker-independent model; and
      
      recognizing said plurality of words within said input speech signal using said speaker-independent model if said speaker is not identified within a predetermined period of time after initiation of said identifying step or if said words in said input speech signal cannot be recognized using said speaker-dependent model.
  - 19. A method as recited in claim 5, repeating said speaker identifying step when a new speaker is expected, and if new, loading a new stored representation of speech signals corresponding to said new speaker.

20. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,wherein if no speaker is identified in said identifying step, performing a step of presenting an enrollment menu for enrolling said speaker.

21. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step, andrecognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words,wherein said stored representation of speech signals includes a codebook formed from a set of clustered feature vectors corresponding to said speaker, and wherein said speech processing model is a speaker-dependent model formed from information different from information used to form said codebooks.

22. A method of operating a speech recognition system, said method comprising the steps of:
- identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words,providing a speech processing model to said speech recognition system in accordance with results of said identifying step,recognizing said plurality of words within said input speech signal with said speech processing model, said stored representation of speech signals and said speech processing model being loaded into said system only once for recognition of said speaker and said plurality of words in said input speech signal so that said system performs continues speech recognition for said plurality of words; and
  
  using a speaker-independent model for recognizing said plurality of words in said input speech signal when an enrolled speaker is not identified in said identifying step.

23. An apparatus for performing speech recognition comprising:
- means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words;
  
  means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and
  
  means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continues speech recognition;
  
  wherein said apparatus further includes;
  
  means for determining whether to perform speech recognition in one of a speaker-independent mode or a speaker-dependent mode; and
  
  means for selecting said speech processing model to be a speaker-dependent model or a speaker-independent model based on said determining step.
- View Dependent Claims (24, 25, 26, 27)
- - 24. Apparatus as recited in claim 23, wherein said stored representation of speech signals includes a plurality of codebooks, each codebook including a plurality of codewords comprising mean and variance values of parameters of clusters of feature vectors computed from frames of speech signals corresponding to an enrolled speaker and wherein said identifying means includesmeans for sampling frames of said input speech signal,means for computing feature vectors from frames of said input speech signal,means for comparing parameters of ones of said feature vectors with said stored mean and variance values to derive a score, andcounting the number of feature vectors which correspond to each said codebook in response to said means for comparing parameters.
  - 25. Apparatus as recited in claim 24, further includingmeans for performing a consistency check of results provided by said means for identifying a speaker.
  - 26. Apparatus as recited in claim 25, further includingmeans for determining whether or not a distance of a parameter of a feature vector from a mean value of a parameter of a codeword is greater than a distance corresponding to a variance of a corresponding parameter of said codeword for each of a plurality of feature vectors, andmeans for detecting a rate of occurrence of a result provided by said means for determining.
  - 27. An apparatus as recited in claim 23, wherein said speech processing model is a speaker-dependent model, and wherein said apparatus further includes:
    - means for providing a speaker-independent model if said speaker is not identified within a predetermined period of time after initiation of said identifying step or if said recognizing means cannot recognize said words in said input speech signal using said speaker-dependent model, andwherein said recognizing means recognizes said words within said input speech signal based on said speaker-independent model.

28. An apparatus for performing speech recognition comprising:
- means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words;
  
  means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and
  
  means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continues speech recognition;
  
  wherein said apparatus further includes;
  
  means for presenting an enrollment menu for enrolling said speaker if said identifying means is unable to identify said speaker.

29. An apparatus for performing speech recognition comprising:
- means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words;
  
  means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and
  
  means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continuous speech recognition;
  
  wherein said identifying means identifies said speaker based on a text-independent model containing codebooks each formed from a set of clustered feature vectors corresponding to a respective one of said plurality of speakers, said stored representation of speech signals containing one of said codebooks, andwherein said speech processing model is a speaker-dependent model formed from information different from information used to form said codebooks.

30. An apparatus for performing speech recognition comprising:
- means for identifying a speaker by text-independent comparison of an input speech signal with a stored representation of speech signals corresponding to one of a plurality of speakers, said input speech signal including a plurality of words;
  
  means for providing a speech processing model to said speech recognition system in response to said means for identifying a speaker; and
  
  means for recognizing said plurality of words within said input speech signal with said speech processing model, said stored representations of speech signals and said speech processing model being loaded only once for recognition of said speaker and said words in said input speech signal so that said system performs continues speech recognition,wherein said stored representation of speech signals includes a plurality of codebooks, each codebook including a plurality of codewords comprising mean and variance values of parameters of clusters of feature vectors computed from frames of speech signals corresponding to an enrolled speaker and wherein said identifying means includes;
  
  means for sampling frames of said input speech signal;
  
  means for computing feature vectors from frames of said input speech signal;
  
  means for comparing parameters of ones of said feature vectors with said stored mean and variance values to derive a score; and
  
  means for counting the number of feature vectors which correspond to each said codebook in response to said means for comparing parameters,wherein said apparatus further includes;
  
  means for performing a consistency check of results provided by said means for identifying a speaker,wherein said identifying means identifies said speaker based on a text-independent model containing codebooks each formed from a set of clustered feature vectors corresponding to a respective one of said plurality of speakers, said stored representation of speech signals containing one of said codebooks, andwherein said means for performing a consistency check;
  
  determines whether a feature vector count for one of said codebooks corresponding to the speaker identified by said identifying means meets predetermined criteria; and
  
  verifies the speaker identified by said identifying means if said feature vector count meets said predetermined criteria, and further analyzes said codebooks to determine a correct speaker if said feature vector count does not meet said predetermined criteria.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Maes, Stephane Herman
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/787,029
Time in Patent Office

1,260 Days
Field of Search

704/231, 704/246, 704/251, 704/275
US Class Current

704/231
CPC Class Codes

G10L 15/07 to the speaker

G10L 17/00 Speaker identification or v...

Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links