Smart training and smart scoring in SD speech recognition system with user defined vocabulary

US 6,535,850 B1
Filed: 03/09/2000
Issued: 03/18/2003
Est. Priority Date: 03/09/2000
Status: Expired due to Term

First Claim

Patent Images

1. A speech training system comprising:

a. a first preprocessing module receiving a first speech signal and outputting a first processed speech signal and for detecting the beginning and end of said first speech signal;

b. a first feature extraction module for extracting feature information from said first processed speech signal and outputting at least one first speech signal feature vector for said first processed speech signal;

c. a first comparison module receiving said first speech signal feature vector and comparing the features of said first speech signal feature vector with a plurality of models stored in a storage medium;

d. a first computing module for computing the distance for each state of each of said plurality of models on said storage medium with respect to said first speech signal feature vector and computing a score for each distance calculation and storing an accumulated score for each model;

e. a second comparison module for comparing accumulated scores for said models to determine the top two models which are most similar to said first speech signal feature vector;

f. a first weighting module for applying increased weighting for dissimilar portions of said most similar models and said first speech signal and marking said most similar models as similar to said first speech signal and for applying increased weighting for dissimilar portions of said first speech signal and stored models and marking said first speech signal model as most similar to each of said similar models;

g. a first estimating module for estimating model parameters for said first speech signal; and

h. a storage device for storing all marked model parameters.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech training and recognition system, the current invention detects and warns the user about the similar sounding entries to vocabulary and permits entry of such confusingly similar terms which are marked along with the stored similar terms to identify the similar words. In addition, the states in similar words are weighted to apply more emphasis to the differences between similar words than the similarities of such words. Another aspect of the current invention is to use modified scoring algorithm to improve the recognition performance in the case where confusing entries were made to the vocabulary despite the warning. Yet another aspect of the current invention is to detect and warn the user about potential problems with new entries such as short words and two or more word entries with long silence periods in between words. Finally, the current invention also includes alerting the user about the dissimilarity of the multiple tokens of the same vocabulary item in the case of multiple-token training.

104 Citations

View as Search Results

11 Claims

1. A speech training system comprising:
- a. a first preprocessing module receiving a first speech signal and outputting a first processed speech signal and for detecting the beginning and end of said first speech signal;
  
  b. a first feature extraction module for extracting feature information from said first processed speech signal and outputting at least one first speech signal feature vector for said first processed speech signal;
  
  c. a first comparison module receiving said first speech signal feature vector and comparing the features of said first speech signal feature vector with a plurality of models stored in a storage medium;
  
  d. a first computing module for computing the distance for each state of each of said plurality of models on said storage medium with respect to said first speech signal feature vector and computing a score for each distance calculation and storing an accumulated score for each model;
  
  e. a second comparison module for comparing accumulated scores for said models to determine the top two models which are most similar to said first speech signal feature vector;
  
  f. a first weighting module for applying increased weighting for dissimilar portions of said most similar models and said first speech signal and marking said most similar models as similar to said first speech signal and for applying increased weighting for dissimilar portions of said first speech signal and stored models and marking said first speech signal model as most similar to each of said similar models;
  
  g. a first estimating module for estimating model parameters for said first speech signal; and
  
  h. a storage device for storing all marked model parameters.
- View Dependent Claims (2, 3, 4)
- - 2. A speech training system as described in claim 1 wherein said system includes a speech recognition system further comprising:
3. A speech recognition system as described in claim 2 wherein:
- a. said first and second preprocessing modules may be grouped within a module;
  
  b. said first, second, third and fourth comparison modules may be grouped within a module;
  
  c. said first and second speech signal feature extraction module may be grouped within a module; and
  
  d. said first, second and third computing module may be grouped within a module.
4. A speech training system as described in claim 1 wherein said system further comprises a sorting module which sorts said scores into a list such that those scores representing the models which are acoustically close to the input signal are at the top of the sort list.

5. A speech training apparatus for interactive training by a user comprising:
- a. a preprocessing module receiving a training speech signal and outputting a processed training speech signal;
  
  b. a feature extraction module for extracting feature information from said processed training speech signal and outputting an feature vector for the each segment of said processed speech;
  
  c. a comparison module for comparing each of said input feature vectors with each state of all model parameters for models stored in a storage module and accumulating said differences and assigning a score based on said differences for each stored model;
  
  d. a sorting module which sorts said scores into a list such that those scores representing the models which are acoustically close to the input signal are at the top of the sort list;
  
  e. a confidence module which uses the top scores to determine whether the top scores are sufficiently close to the input signal to represent confusion on recognition;
  
  and f. a weighting module which assigns increased weighting to dissimilar segments of the models and the training signal for use in recognition.

6. A method for processing a first signal for comparison against stored signal models having a plurality of states stored on a storage medium comprising the following steps:
- a. inputting said training signal to be processed into a preprocessing system;
  
  b. segregating said training signal into a plurality of time frames;
  
  c. deriving a plurality of training signal feature vectors each related to a frame of said training signal;
  
  d. saving each feature vector in storage; and
  
  computing the distance for all stored models at each state of said stored model stored on said storage medium with respect to said each of said training features vectors;
  
  e. accumulating a score based on said distance before evaluating the next training features vector, until the distance for all training feature vectors is accumulated;
  
  f. sorting said accumulated scores into a list with the scores at the top representing the closest matching models and the scores at the bottom representing the least closest matches;
  
  g. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
  
  f. partitioning the sequence of training feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
  
  g. estimating an model for said training signal; and
  
  h. saving the training signal model in said storage medium.
- View Dependent Claims (7, 8, 9, 10)
- - 7. A method for processing a training signal as described in claim 6 further comprising the following steps:
8. A method for processing a training signal as described in claim 7 when said lower duration limit is not met comprising the following steps:
- a. sorting said accumulated scores into a list with the scores at the top representing the closest matching models and the scores at the bottom representing the least closest matches;
  
  b. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
  
  c. partitioning the sequence of feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
  
  d. estimating an model for said training signal; and
  
  e. saving the training signal model in said storage medium.
9. A method for processing a training signal as described in claim 7 when said upper duration limit is not met comprising the following steps:
- a. sorting said accumulated scores into a list with the scores at the top representing the closest matching models to said training signal and the scores at the bottom representing the least closest matches;
  
  b. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
  
  c. partitioning the sequence of training feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
  
  d. estimating an model for said training signal; and
  
  e. saving the training signal model in said storage medium.
10. A method for processing a training signal as described in claim 7 when said silence duration limit is exceeded comprising the following steps:
- a. truncating said silence duration of said training signal to less than the selected limit;
  
  b. sorting said accumulated scores into a list with the scores at the top representing the closest matching models to said training signal and the scores at the bottom representing the least closest matches;
  
  c. weighting the scores for a selected number of scores at the top of said list to emphasize the dissimilar sections of the closest models to the training signal and marking said models as close to the training signal model;
  
  d. partitioning the sequence of feature vectors of said training signal into segments and weighting the scores for the training signal to emphasize the dissimilar sections from the models and marking said training model as close to the selected number of closest models;
  
  e. estimating an model for said training signal; and
  
  f. saving the training signal model in said storage medium.

11. A speech processing method for processing an input digital speech signal and comparing such signal against stored models comprising the following steps:
- a. generating feature vectors representative of said input speech;
  
  b. evaluating said feature vectors until the start of speech is found;
  
  c. calculating the distance for each of the feature vectors against each of the stored models for each state of said models;
  
  d. computing an total accumulated score for said distance calculations for said feature vectors against said stored models;
  
  e. arranging scores in descending order;
  
  f. re-computing all of the distances for the input feature vectors and accumulating a score for the close scores using a weighted measure which applies a higher weighting to dissimilar states of each model than is applied to similar states;
  
  g. arranging the recomputed close scores in descending order; and
  
  h. assigning an I.D. to the input word based on the highest score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WIAV Solutions LLC
Original Assignee
Conexant Systems Incorporated (Synaptics Incorporated)
Inventors
Bayya, Aruna
Primary Examiner(s)
Banks-Harold, Marsha D.
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/522,448
Time in Patent Office

1,104 Days
Field of Search

704/231, 704/236, 704/238, 704/243, 704/245, 704/251, 704/252, 704/253, 704/254, 704/239
US Class Current

704/239
CPC Class Codes

G10L 15/07 to the speaker

G10L 2015/0635 updating or merging of old ...

Smart training and smart scoring in SD speech recognition system with user defined vocabulary

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

104 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Smart training and smart scoring in SD speech recognition system with user defined vocabulary

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

104 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links