Speech recognition using thresholded speaker class model selection or model adaptation
First Claim
1. A speech processing system includingmeans for clustering information values representing respective frames of utterances of a plurality of speakers by speaker class in accordance with a threshold value to provide speaker class specific clusters of information,means for comparing information representing frames of an utterance of a speaker with respective clusters of said speaker class specific clusters of information to identify a speaker class, andmeans for processing speech information with a speaker class dependent model selected in accordance with a speaker class identified by said means for comparing information.
1 Assignment
0 Petitions
Accused Products
Abstract
Clusters of quantized feature vectors are processed against each other using a threshold distance value to cluster mean values of sets of parameters contained in speaker specific codebooks to form classes of speakers against which feature vectors computed from an arbitrary input speech signal can be compared to identify a speaker class. The number of codebooks considered in the comparison may be thus reduced to limit mixture elements which engender ambiguity and reduce system response speed when the speaker population becomes large. A speaker class processing model which is speaker independent within the class may be trained on one or more members of the class and selected for implementation in a speech recognition processor in accordance with the speaker class recognized to further improve speech recognition to level comparable to that of a speaker dependent model. Formation of speaker classes can be supervised by identification of groups of speakers to be included in the class and the speaker class dependent model trained on members of a respective group.
173 Citations
18 Claims
-
1. A speech processing system including
means for clustering information values representing respective frames of utterances of a plurality of speakers by speaker class in accordance with a threshold value to provide speaker class specific clusters of information, means for comparing information representing frames of an utterance of a speaker with respective clusters of said speaker class specific clusters of information to identify a speaker class, and means for processing speech information with a speaker class dependent model selected in accordance with a speaker class identified by said means for comparing information.
-
9. A method of operating a speech recognition system, said method comprising the steps of
identifying a speaker class by comparing an input speech signal with a stored representation of speech signals corresponding to a speaker class, in accordance with a threshold value, providing a speaker class dependent speech processing model to said speech recognition system in accordance with results of said identifying step, said speech processing model being speaker independent within a speaker class, and processing said speech signal with said speech processing model.
Specification