Smart training and smart scoring in SD speech recognition system with user defined vocabulary
First Claim
1. A speech training system comprising:
- a. a first preprocessing module receiving a first speech signal and outputting a first processed speech signal and for detecting the beginning and end of said first speech signal;
b. a first feature extraction module for extracting feature information from said first processed speech signal and outputting at least one first speech signal feature vector for said first processed speech signal;
c. a first comparison module receiving said first speech signal feature vector and comparing the features of said first speech signal feature vector with a plurality of models stored in a storage medium;
d. a first computing module for computing the distance for each state of each of said plurality of models on said storage medium with respect to said first speech signal feature vector and computing a score for each distance calculation and storing an accumulated score for each model;
e. a second comparison module for comparing accumulated scores for said models to determine the top two models which are most similar to said first speech signal feature vector;
f. a first weighting module for applying increased weighting for dissimilar portions of said most similar models and said first speech signal and marking said most similar models as similar to said first speech signal and for applying increased weighting for dissimilar portions of said first speech signal and stored models and marking said first speech signal model as most similar to each of said similar models;
g. a first estimating module for estimating model parameters for said first speech signal; and
h. a storage device for storing all marked model parameters.
8 Assignments
0 Petitions
Accused Products
Abstract
In a speech training and recognition system, the current invention detects and warns the user about the similar sounding entries to vocabulary and permits entry of such confusingly similar terms which are marked along with the stored similar terms to identify the similar words. In addition, the states in similar words are weighted to apply more emphasis to the differences between similar words than the similarities of such words. Another aspect of the current invention is to use modified scoring algorithm to improve the recognition performance in the case where confusing entries were made to the vocabulary despite the warning. Yet another aspect of the current invention is to detect and warn the user about potential problems with new entries such as short words and two or more word entries with long silence periods in between words. Finally, the current invention also includes alerting the user about the dissimilarity of the multiple tokens of the same vocabulary item in the case of multiple-token training.
104 Citations
11 Claims
-
1. A speech training system comprising:
-
a. a first preprocessing module receiving a first speech signal and outputting a first processed speech signal and for detecting the beginning and end of said first speech signal;
b. a first feature extraction module for extracting feature information from said first processed speech signal and outputting at least one first speech signal feature vector for said first processed speech signal;
c. a first comparison module receiving said first speech signal feature vector and comparing the features of said first speech signal feature vector with a plurality of models stored in a storage medium;
d. a first computing module for computing the distance for each state of each of said plurality of models on said storage medium with respect to said first speech signal feature vector and computing a score for each distance calculation and storing an accumulated score for each model;
e. a second comparison module for comparing accumulated scores for said models to determine the top two models which are most similar to said first speech signal feature vector;
f. a first weighting module for applying increased weighting for dissimilar portions of said most similar models and said first speech signal and marking said most similar models as similar to said first speech signal and for applying increased weighting for dissimilar portions of said first speech signal and stored models and marking said first speech signal model as most similar to each of said similar models;
g. a first estimating module for estimating model parameters for said first speech signal; and
h. a storage device for storing all marked model parameters. - View Dependent Claims (2, 3, 4)
a. a second preprocessing module for receiving a second speech signal and outputting a processed second speech signal and for detecting the beginning and end of said second speech signal;
b. a second speech signal feature extraction module for extracting feature information from said processed second speech signal and outputting at least one second feature vector for said processed second speech signal;
c. a third comparison module receiving said second feature vector and comparing the features of said second feature vector with a plurality of models stored in a storage medium;
d. a second computing module for computing the distance for each state of each of said plurality of models installed on said storage medium with respect to said second feature vector and computing a score for each distance calculation and storing a second accumulated score for each model;
e. a fourth comparison module for comparing accumulated scores for said models to determine at least the top two models which are most similar to said second speech signal;
f. a third computing module for computing the total score of said at least top two models as a second weighed average score; and
i. a decision logic module which evaluates said weighted average score and assigns an ID of the individual one of said at least two top two models having the highest weighted average score.
-
-
3. A speech recognition system as described in claim 2 wherein:
-
a. said first and second preprocessing modules may be grouped within a module;
b. said first, second, third and fourth comparison modules may be grouped within a module;
c. said first and second speech signal feature extraction module may be grouped within a module; and
d. said first, second and third computing module may be grouped within a module.
-
-
4. A speech training system as described in claim 1 wherein said system further comprises a sorting module which sorts said scores into a list such that those scores representing the models which are acoustically close to the input signal are at the top of the sort list.
-
5. A speech training apparatus for interactive training by a user comprising:
-
a. a preprocessing module receiving a training speech signal and outputting a processed training speech signal;
b. a feature extraction module for extracting feature information from said processed training speech signal and outputting an feature vector for the each segment of said processed speech;
c. a comparison module for comparing each of said input feature vectors with each state of all model parameters for models stored in a storage module and accumulating said differences and assigning a score based on said differences for each stored model;
d. a sorting module which sorts said scores into a list such that those scores representing the models which are acoustically close to the input signal are at the top of the sort list;
e. a confidence module which uses the top scores to determine whether the top scores are sufficiently close to the input signal to represent confusion on recognition;
and f. a weighting module which assigns increased weighting to dissimilar segments of the models and the training signal for use in recognition.
-
-
6. A method for processing a first signal for comparison against stored signal models having a plurality of states stored on a storage medium comprising the following steps:
-
a. inputting said training signal to be processed into a preprocessing system;
b. segregating said training signal into a plurality of time frames;
c. deriving a plurality of training signal feature vectors each related to a frame of said training signal;
d. saving each feature vector in storage; and
computing the distance for all stored models at each state of said stored model stored on said storage medium with respect to said each of said training features vectors;
e. accumulating a score based on said distance before evaluating the next training features vector, until the distance for all training feature vectors is accumulated;
f. sorting said accumulated scores into a list with the scores at the top representing the closest matching models and the scores at the bottom representing the least closest matches;
g. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
f. partitioning the sequence of training feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
g. estimating an model for said training signal; and
h. saving the training signal model in said storage medium. - View Dependent Claims (7, 8, 9, 10)
a. setting a lower limit for said training signal duration;
b. computing the duration of said training signal;
c. determining if said lower duration limit is met;
d. requesting another signal input if the lower duration limit is not met;
e. setting an upper duration limit;
f. determining if the duration of said training signal exceeds said upper duration limit;
e. requesting another signal input if the lower duration limit is not met;
g. setting a silence duration limit; and
h. requesting another signal input if the silence duration limit is exceeded.
-
-
8. A method for processing a training signal as described in claim 7 when said lower duration limit is not met comprising the following steps:
-
a. sorting said accumulated scores into a list with the scores at the top representing the closest matching models and the scores at the bottom representing the least closest matches;
b. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
c. partitioning the sequence of feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
d. estimating an model for said training signal; and
e. saving the training signal model in said storage medium.
-
-
9. A method for processing a training signal as described in claim 7 when said upper duration limit is not met comprising the following steps:
-
a. sorting said accumulated scores into a list with the scores at the top representing the closest matching models to said training signal and the scores at the bottom representing the least closest matches;
b. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
c. partitioning the sequence of training feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
d. estimating an model for said training signal; and
e. saving the training signal model in said storage medium.
-
-
10. A method for processing a training signal as described in claim 7 when said silence duration limit is exceeded comprising the following steps:
-
a. truncating said silence duration of said training signal to less than the selected limit;
b. sorting said accumulated scores into a list with the scores at the top representing the closest matching models to said training signal and the scores at the bottom representing the least closest matches;
c. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
d. partitioning the sequence of feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
e. estimating an model for said training signal; and
f. saving the training signal model in said storage medium.
-
-
11. A speech processing method for processing an input digital speech signal and comparing such signal against stored models comprising the following steps:
-
a. generating feature vectors representative of said input speech;
b. evaluating said feature vectors until the start of speech is found;
c. calculating the distance for each of the feature vectors against each of the stored models for each state of said models;
d. computing an total accumulated score for said distance calculations for said feature vectors against said stored models;
e. arranging scores in descending order;
f. re-computing all of the distances for the input feature vectors and accumulating a score for the close scores using a weighted measure which applies a higher weighting to dissimilar states of each model than is applied to similar states;
g. arranging the recomputed close scores in descending order; and
h. assigning an I.D. to the input word based on the highest score.
-
Specification