Speech recognition using thresholded speaker class model selection or model adaptation

US 5,895,447 A
Filed: 01/28/1997
Issued: 04/20/1999
Est. Priority Date: 02/02/1996
Status: Expired due to Term

First Claim

Patent Images

1. A speech processing system includingmeans for clustering information values representing respective frames of utterances of a plurality of speakers by speaker class in accordance with a threshold value to provide speaker class specific clusters of information,means for comparing information representing frames of an utterance of a speaker with respective clusters of said speaker class specific clusters of information to identify a speaker class, andmeans for processing speech information with a speaker class dependent model selected in accordance with a speaker class identified by said means for comparing information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Clusters of quantized feature vectors are processed against each other using a threshold distance value to cluster mean values of sets of parameters contained in speaker specific codebooks to form classes of speakers against which feature vectors computed from an arbitrary input speech signal can be compared to identify a speaker class. The number of codebooks considered in the comparison may be thus reduced to limit mixture elements which engender ambiguity and reduce system response speed when the speaker population becomes large. A speaker class processing model which is speaker independent within the class may be trained on one or more members of the class and selected for implementation in a speech recognition processor in accordance with the speaker class recognized to further improve speech recognition to level comparable to that of a speaker dependent model. Formation of speaker classes can be supervised by identification of groups of speakers to be included in the class and the speaker class dependent model trained on members of a respective group.

173 Citations

18 Claims

1. A speech processing system includingmeans for clustering information values representing respective frames of utterances of a plurality of speakers by speaker class in accordance with a threshold value to provide speaker class specific clusters of information,means for comparing information representing frames of an utterance of a speaker with respective clusters of said speaker class specific clusters of information to identify a speaker class, andmeans for processing speech information with a speaker class dependent model selected in accordance with a speaker class identified by said means for comparing information.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A system as recited in claim 1, further includingmeans for supervising a class with which a speaker may be associated,wherein said means for clustering is responsive to said means for supervising.
  - 3. A system as recited in claim 2, further includingmeans for training said speaker class dependent model in accordance with codebooks corresponding to speakers in a class identified by said means for supervising.
  - 4. A system as recited in claim 3, wherein said codebooks of a speaker class are adapted in response to a speaker class dependent model.
  - 5. A system as recited in claim 1, wherein said means for comparing includesmeans for sampling frames of said input speech signal,means for computing a feature vectors from frames of said input speech signal,means for comparing parameters of ones of said feature vectors computed in said computing step with said stored mean and variance values to derive a score, andmeans for counting the number of feature vectors which correspond to each said codebook responsive to said means for comparing parameters.
  - 6. A system as recited in claim 1, wherein said means for comparing information includes means for pattern recognition.
  - 7. A system as recited in claim 1, further includingmeans for processing said speech information in accordance with a speaker independent model prior to completion of identification of a class by said means for comparing information.
  - 8. A system as recited in claim 7, further includingmeans for processing said speech information in accordance with a speaker dependent model subsequent to completion of identification of a class by said means for comparing information.

9. A method of operating a speech recognition system, said method comprising the steps ofidentifying a speaker class by comparing an input speech signal with a stored representation of speech signals corresponding to a speaker class, in accordance with a threshold value,providing a speaker class dependent speech processing model to said speech recognition system in accordance with results of said identifying step, said speech processing model being speaker independent within a speaker class, andprocessing said speech signal with said speech processing model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 10. A method as recited in claim 9, wherein said stored representation of speech signals includes a plurality of codebooks, each codebook including a plurality of codewords comprising mean and variance values of parameters of clusters of feature vectors computed from frames of speech signals corresponding to an enrolled speaker and wherein said identifying step includes the steps ofsampling frames of said input speech signal,computing a feature vectors from frames of said input speech signal,comparing parameters of ones of said feature vectors computed in said computing step with said stored mean and variance values to derive a score, andcounting the number of feature vectors which correspond to each said codebook in accordance with results of said step of comparing parameters.
  - 11. A method as recited in claim 9, wherein said identifying step includes a template matching process.
  - 12. A method as recited in claim 9, including the further step ofprocessing said speech signal in accordance with a speaker independent model prior to completion of said identifying step.
  - 13. A method as recited in claim 9, including the further step ofprocessing said speech signal in accordance with a speaker dependent model subsequent to completion of said identifying step.
  - 14. A method as recited in claim 9, including the further step ofprocessing said speech signal in accordance with a speaker independent model subsequent to completion of said identifying step when said identifying step does not identify an enrolled speaker.
  - 15. A method as recited in claim 9, including the further step offorming said stored representation of speech signals corresponding to a speaker class by clustering of codewords.
  - 16. A method as recited in claim 15, including the further step ofsupervising formation of said stored representation by identifying codewords which can be clustered by identification of a group of speakers.
  - 17. A method as recited in claim 16, including the further step ofa consistency check to accept or reject the identified class.
  - 18. A method as recited in claim 16, including the further step ofadapting said stored representation of speech signals corresponding to a class using said speaker class dependent speech processing model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Maes, Stephane Herman, Ittycheriah, Abraham Poovakunnel
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/787,031
Time in Patent Office

812 Days
Field of Search

704/231, 704/246, 704/251, 704/275
US Class Current

704/231
CPC Class Codes

G10L 15/07   to the speaker

G10L 21/0272   Voice signal separating

G10L 25/12   the extracted parameters be...

Speech recognition using thresholded speaker class model selection or model adaptation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

173 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using thresholded speaker class model selection or model adaptation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

173 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links