Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models

US 5,333,236 A
Filed: 09/10/1992
Issued: 07/26/1994
Est. Priority Date: 09/10/1992
Status: Expired due to Fees

First Claim

Patent Images

1. A speech coding apparatus comprising:

means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;

means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value;

means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal;

means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transitions model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each model output;

means for generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal;

means for generating a speech transition match score for the first feature vector signal and each speech transition, each speech transition match score comprising the best model match score for the first feature vector signal and all speech transition models representing the speech transition andmeans for outputting the identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition as a coded utterance representation signal of the first feature vector signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.

249 Citations

31 Claims

1. A speech coding apparatus comprising:
- means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
  
  means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value;
  
  means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal;
  
  means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transitions model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each model output;
  
  means for generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal;
  
  means for generating a speech transition match score for the first feature vector signal and each speech transition, each speech transition match score comprising the best model match score for the first feature vector signal and all speech transition models representing the speech transition andmeans for outputting the identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition as a coded utterance representation signal of the first feature vector signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. An apparatus as claimed in claim 1, further comprising:
    - means for storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value; and
      
      means for generating a speech unit match score for the first feature vector signal and each speech unit, each speech unit match score comprising the best speech transition match score for the first feature vector signal and all speech transitions in the speech unit; and
      
      characterized in that the output means outputs the identification value of each speech unit and the speech unit match score for the first feature vector signal and each speech unit as a coded utterance representation signal of the first feature vector signal.
  - 3. An apparatus as claimed in claim 2, characterized in that:
    - the comparison means comprises ranking means for ranking the prototype vector signals in order of the estimated closeness of each prototype vector signal to the first feature vector signal to obtain a rank score for the first feature vector signal and each prototype vector signal; and
      
      the prototype match score for the first feature vector signal and each prototype vector signal comprises the rank score for the first feature vector signal and each prototype vector signal.
  - 4. An apparatus as claimed in claim 3, characterized in that each speech transition model represents the corresponding speech transition in a unique context of prior and subsequent speech transitions.
  - 5. An apparatus as claimed in claim 4, characterize in that:
    - each speech unit is a phoneme; and
      
      each speech transition is a portion of a phoneme.
  - 6. An apparatus as claimed in claim 5, characterized in that the measuring means comprises a microphone.
  - 7. An apparatus as claimed in claim 6, further comprising means for storing the coded utterance representation signal of the feature vector signal.
  - 8. An apparatus as claimed in claim 7, characterized in that the means for storing prototype vector signals comprises electronic read/write memory.

9. A speech coding method comprising:
- measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
  
  storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value;
  
  comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal;
  
  storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each speech transition model output;
  
  generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal;
  
  generating a speech transition match score for the first feature vector signal and each speech transition, each speech transition match score comprising the best model match score for the first feature vector signal and all speech transition models representing the speech transition; and
  
  outputting the identification value of each speech transition and the speech transition match score For the first feature vector signal and each speech transition as a coded utterance representation signal of the first feature vector signal.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. A method as claimed in claim 9, further comprising the steps of:
    - storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value; and
      
      generating a speech unit match score for the first feature vector signal and each speech unit, each speech unit match score comprising the best speech transition match score for the first feature vector signal and all speech transitions in the speech unit; and
      
      characterized in that the step of outputting outputs the identification value of each speech unit and the speech unit match score for the first feature vector signal and each speech unit as a coded utterance representation signal of the first feature vector signal.
  - 11. A method as claimed in claim 10, characterized in that:
    - the step of comparing comprises ranking the prototype vector signals in order of the estimated closeness of each prototype vector signal to the first feature vector signal to obtain a rank score for the first feature vector signal and each prototype vector signal; and
      
      the prototype match score for the first feature vector signal and each prototype vector signal comprises the rank score for the first feature vector signal and each prototype vector signal.
  - 12. A method as claimed in claim 11, characterized in that:
    - each speech transition model represents the corresponding speech transition in a unique context of prior and subsequent speech transitions.
  - 13. A method as claimed in claim 12, characterized in that:
    - each speech unit is a phoneme; and
      
      each speech transition is a portion of a phoneme.
  - 14. A method as claimed in claim 12, further comprising the step of storing the coded utterance representation signal of the feature vector signal.

15. A speech recognition apparatus comprising:
- means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
  
  means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value;
  
  means for comparing the closeness of the feature value of each feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for each feature vector signal and each prototype vector signal;
  
  means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition model, each speech transition model having a plurality of speech transitions model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each model output;
  
  means for generating a model match score for each feature vector signal and each speech transition model, the model match score for a feature vector signal comprising the output probability for at least one prototype match score for the feature vector signal and a prototype vector signal;
  
  means for generating a speech transition match score for each feature vector signal and each speech transition, the speech transition match score for a feature vector signal. comprising the best model match score for the feature vector signal and all speech transition models representing the speech transition;
  
  means for storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value;
  
  means for generating a speech unit match score for each feature vector signal and each speech unit, the speech unit match score for a feature vector signal comprising the best speech transition match score for the feature vector signal and all speech transitions in the speech unit;
  
  means for outputting the identification value of each speech unit and the speech unit match score of a feature vector signal and each speech unit as a coded utterance representation signal of the feature vector signal;
  
  means for storing probabilistic models for a plurality of words, each word model comprising at least one speech unit model, each word model having a starting state, an ending state, and a plurality of paths through the speech unit models from the starting state at least part of the way to the ending state;
  
  means for generating a word match score for the series of feature vector signals and each of a plurality of words, each word match score comprising a combination of the speech unit match scores for the series of feature vector signals and the speech units along at least one path through the series of speech unit models in the model of the word;
  
  means for identifying one or more best candidate words having the best word match scores; and
  
  means for outputting at least one best candidate word.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 16. An apparatus as claimed in claim 15, characterized in that:
    - the comparison means comprises ranking means for ranking the prototype vector signals in order of the estimated closeness of each prototype vector signal to each feature vector signal to obtain a rank score for each feature vector signal and each prototype vector signal; and
      
      the prototype match score for a feature vector signal and each prototype vector signal comprises the rank score for the feature vector signal and the prototype vector signal.
  - 17. An apparatus as claimed in claim 16, characterized in that each speech unit model represents the corresponding speech unit in a unique context of prior and subsequent speech units.
  - 18. An apparatus as claimed in claim 17, characterized in that each speech unit is a phoneme, and each speech transition is a portion of a phoneme.
  - 19. An apparatus as claimed in claim 18, characterized in that the measuring means comprises a microphone.
  - 20. An apparatus as claimed in claim 19, further comprising means for storing the coded utterance representation signal of the feature vector signal.
  - 21. An apparatus as claimed in claim 18, characterized in that the means for storing prototype vector signals comprises electronic read/write memory.
  - 22. An apparatus as claimed in claim 18, characterized in that the word output means comprises a display.
  - 23. An apparatus as claimed in claim 18, characterized in that the word output means comprises a printer.
  - 24. An apparatus as claimed in claim 18, characterized in that the word output means comprises a speech synthesizer.
  - 25. An apparatus as claimed in claim 18, characterized in that the word output means comprises a loudspeaker.

26. A speech recognition method comprising:
- measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
  
  storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value;
  
  comparing the closeness of the feature value of each feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for each feature vector signal and each prototype vector signal;
  
  storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each speech transition model output;
  
  generating a model match score for each feature vector signal and each speech transition model, the model match score for a feature vector signal comprising the output probability for at least one prototype match score for the feature vector signal and a prototype vector signal;
  
  generating a speech transition match score for each feature vector signal and each speech transition, the speech transition match score for a feature vector signal comprising the best model match score for the feature vector signal and all speech transition models representing the speech transition;
  
  storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value;
  
  generating a speech unit match score for each feature vector signal and each speech unit, the speech unit match score for a feature vector signal comprising the best speech transition match score for the feature vector signal and all speech transitions in the speech unit;
  
  outputting the identification value of each speech unit and the speech unit match score of a feature vector signal and each speech unit as a coded utterance representation signal of the feature vector signal;
  
  storing probabilistic models for a plurality of words, each word model comprising at least one speech unit model, each word model having a starting state, an ending state, and a plurality of paths through the speech unit models from the starting state at least part of the way to the ending state;
  
  generating a word match score for the series of feature vector signals and each of a plurality of words, each word match score comprising a combination of the speech unit match scores for the series of feature vector signals and the speech units along at least one path through the series of speech unit models in the model of the word;
  
  identifying one or more best candidate words having the best word match scores; and
  
  outputting at least one best candidate word.
- View Dependent Claims (27, 28, 29, 30)
- - 27. A method as claimed in claim 26, characterized in that:
    - the step of comparing comprises ranking the prototype vector signals in order of the estimated closeness of each prototype vector signal to each feature vector signal to obtain a rank score for each feature vector signal and each prototype vector signal; and
      
      the prototype match score for a feature vector signal and each prototype vector signal comprises the rank score for the feature vector signal and the prototype vector signal.
  - 28. A method as claimed in claim 27, characterized in that each speech unit model represents the corresponding speech unit in a unique context of prior and subsequent speech units.
  - 29. A method as claimed in claim 28, characterized in that each speech unit is a phoneme, and each speech transition is a portion of a phoneme.
  - 30. A method as claimed in claim 29, characterized in that the step of outputting comprises displaying at least one best candidate word.

31. A speech coding apparatus comprising:
- means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
  
  means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value;
  
  means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal;
  
  means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each speech transition model output;
  
  means for generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal;
  
  means for storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value;
  
  means for generating a speech unit match score for the first feature vector signal and each speech unit, each speech unit match score comprising the best model match score for the first feature vector signal and all speech transition models representing speech transitions in the speech unit; and
  
  means for outputting the identification value of each speech unit and the speech unit match score for the first feature vector signal and each speech unit as a coded utterance representation signal of the first feature vector signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
De Souza, Peter V., Picheny, Michael A., Gopalakrishnan, Ponani S., Bahl, Lalit R.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/942,862
Time in Patent Office

684 Days
Field of Search

381/41-47, 395/2.65, 395/2.64, 395/2.66
US Class Current

704/256.4
CPC Class Codes

G10L 19/06 Determination or coding of ...

Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

249 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

249 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links