Speech recoginition methods and apparatus

US 5,893,059 A
Filed: 04/17/1997
Issued: 04/06/1999
Est. Priority Date: 04/17/1997
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition method, comprising the steps of:

processing a signal including an utterance to recognize a word included in the utterance, the processing step including the steps of;

generating a set of signal characteristic information from the utterance and;

scoring the set of signal characteristic information against a plurality of different speech models at least two of which were generated using different speech model generation techniques, the scoring against different speech models which were generated using different model generation techniques being performed differently.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for transitioning from one speech recognition system to another and for reusing existing speech recognition data are described. In particular, various methods of converting speech recognition templates or models from a first format to a second format are described. Methods for improving the recognition rate achieved using converted templates or models are also described. These methods involve storing source and/or scoring information for templates or models so that converted models or templates can be scored differently than original models or templates to thereby reflect the effect the conversion process has on recognition scores. In order to enhance recognition results in one embodiment an available compressed voice recording is used in the conversion process. The methods and apparatus of the present invention can be applied to a wide variety of speech recognition template and model conversion applications. Methods and apparatus for generating garbage models are also described. In one embodiment a garbage model is generated dynamically at recognition time using a period of silence in the utterance upon which the recognition operation is to be performed as the source of the data required to generated the garbage model. In this manner a garbage model is generated to reflect the particular background noise conditions, associated with a particular utterance, upon which speech recognition is to be performed.

Citations

25 Claims

1. A speech recognition method, comprising the steps of:
- processing a signal including an utterance to recognize a word included in the utterance, the processing step including the steps of;
  
  generating a set of signal characteristic information from the utterance and;
  
  scoring the set of signal characteristic information against a plurality of different speech models at least two of which were generated using different speech model generation techniques, the scoring against different speech models which were generated using different model generation techniques being performed differently.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising the steps of:
    - storing the plurality of different speech models in a database; and
      
      storing scoring information associated with the plurality of different speech models in the database.
  - 3. The method of claim 2,wherein the scoring step includes performing a Viterbi search which includes a pruning step;
    - andwherein the scoring information associated with the models is used as part of the pruning step in determining which models should be eliminated from consideration as potential matches with the utterance.
  - 4. The method of claim 3,wherein a first set of the plurality of different speech models were generated from uncompressed speech;
    - andwherein a second set of the plurality of different speech models were generated from existing speech recognition models or templates;
      
      wherein the scoring step involves a plurality of scoring operations corresponding to different points in the Viterbi search and wherein the pruning step is one of a plurality of pruning steps performed at the different points in the Viterbi search; and
      
      wherein a lower score, indicating a poorer match between the model being scored and the utterance, is used as a pruning threshold which results in pruning if the score does not exceed the threshold, is used for the first set of models than is used for the second set of models, at each point in the Viterbi search were pruning is performed.
  - 5. The method of claim 1, wherein the scoring information includes weighting factor information.
  - 6. The method of claim 1, wherein the scoring information includes model source information.
  - 7. The method of claim 1, further comprising the steps of:
    - determining as a function of a score generated in the scoring step that at least a portion of the utterance matches a particular one of the models, the particular one of the models representing a recognized word.
  - 8. The method of claim 7,wherein each of the models in the plurality of different speech models is a hidden Markov word model;
    - andwherein the method further comprises the step of;
      
      generating an updated model from the utterance and the particular one of the models representing the recognized word.
  - 9. The method of claim 8, further comprising the step of:
    - replacing in the database, the particular one of the models representing the recognized word with the updated model.
  - 10. The method of claim 9, further comprising the step of:
    - storing in the data base scoring information associated with the updated model that is different from the stored scoring information that was previously associated with the particular one of the models.
  - 11. The method of claim 8, further comprising the steps of:
    - monitoring a user of the system for indicia that the recognized word was correctly identified;
      
      upon detecting said indicia from the user that the recognized word was correctly identified;
      
      i. generating, using the utterance, an updated model representing the recognized word; and
      
      ii. replacing the particular one of the models included in the database with the updated model.
  - 12. The method of claim 1, further comprising the steps of detecting the start of speech in the signal;
    - andgenerating a silence model from a portion of the signal which precedes the start of the speech.
  - 13. The method of claim 12,wherein the silence model is generated in real time;
    - wherein the generated silence model is substituted for a previously generated static silence model; and
      
      wherein the set of generated signal characteristic information is scored against the generated silence model.

14. A method of performing speech recognition comprising the steps of:
- storing speech recognition models from a first source;
  
  storing speech recognition models from a second source;
  
  receiving a signal including a segment of speech upon which a speech recognition task is to be performed;
  
  accessing said stored speech recognition models;
  
  scoring, the received segment of speech against the accessed speech recognition models, the scoring being performed in such a manner that the scoring applied to models from the first source is different than the scoring applied to models from the first source; and
  
  determining if the received segment of speech corresponds to one of the accessed speech recognition models as a function of a result of the scoring operation.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The method of claim 14,wherein the first source is a training circuit which generates speech recognition models from speech which has not been compressed;
    - andwherein the second source is a speech template conversion circuit.
  - 16. The method of claim 15, further comprising the step of:
    - processing a portion of the signal preceding the speech to generate a silence model; and
      
      using the silence model generated from the signal when performing the speech recognition operation on the segment of speech included in the signal.
  - 17. The method of claim 14, further comprising the step of:
    - generating an updated speech recognition model from the segment of speech when it is determined that the segment of speech corresponds to one of the accessed speech recognition models.
  - 18. The method of claim 17, further comprising the step of:
    - detecting indicia from a user of the system that the determination that the segment of speech corresponds to said one of the accessed models is correct; and
      
      substituting in a database of speech recognition models, the updated speech recognition model for the one of the accessed speech recognition models to which the segment of speech corresponds.
  - 19. The method of claim 18, wherein the updated speech recognition model is a hidden Markov model.

20. A method of processing a signal including speech, the method comprising the steps of:
- receiving the signal;
  
  generating a silence model from a portion of the signal which precedes the speech; and
  
  using the generated silence model to perform a speech recognition operation on the speech included in the received signal.
- View Dependent Claims (21, 22, 23, 24)
- - 21. The method of claim 20, further comprising the step of:
    - detecting the start of speech in the received signal by determining when the amplitude of the signal exceeds a preselected threshold value.
  - 22. The method of claim 21, wherein the performed speech recognition operation includes the use of a plurality of word models and static silence models generated prior to the receipt of the received signal.
  - 23. The method of claim 22, further comprising the step of substituting the silence model generated from a portion of the signal which precedes the speech for a previously generated static silence model.
  - 24. The method of claim 22, further comprising the step of:
    - determining the duration of the portion of the signal which precedes the speech included in the signal; and
      
      wherein said step of generating a silence model is performed only if the determined duration exceeds a preselected minimum duration.

25. A speech recognition system, comprising:
- means for receiving a signal including a segment of speech;
  
  means for generating a silence model from a portion of said signal which precedes the segment of speech; and
  
  means for performing a speech recognition operation on said segment of speech using the generated silence model and at least one additional speech recognition model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
NYNEX Science & Technology, Inc. (Verizon Communications Inc.)
Inventors
Raman, Vijay R.
Primary Examiner(s)
Dorvil, Richemond

Application Number

US08/842,922
Time in Patent Office

719 Days
Field of Search

704/231, 704/236, 704/256, 704/255, 704/242, 704/243, 704/244
US Class Current

704/256.2
CPC Class Codes

G10L 15/063   Training

H04M 1/271   controlled by voice recogni...

H04M 2201/40   using speech recognition

H04M 3/42204   Arrangements at the exchang...

Speech recoginition methods and apparatus

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recoginition methods and apparatus

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links