Speech recognition system and method employing data compression

US 5,627,939 A
Filed: 09/03/1993
Issued: 05/06/1997
Est. Priority Date: 09/03/1993
Status: Expired due to Term

First Claim

Patent Images

1. A data compression method for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:

receiving a plurality of spoken training utterances; and

convening the spoken training utterances into a stored output probability table, the convening step including;

creating a predetermined number of codewords based on the spoken training utterances, each codeword representing an acoustic feature of the spoken training utterances;

creating the hidden Markov models based on the spoken training utterances, each hidden Markov model being created by steps that include associating a probability value with each codeword, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword; and

compressing at least some of the probability values associated with a selected one of the codewords based on others of the probability values associated with the selected codeword, thereby compressing the stored output probability table.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A data compression system greatly compresses the stored data used by a speech recognition system employing hidden Markov models (HMM). The speech recognition system vector quantizes the acoustic space spoken by humans by dividing it into a predetermined number of acoustic features that are stored as codewords in a vector quantization (output probability) table or codebook. For each spoken word, the speech recognition system calculates an output probability value for each codeword, the output probability value representing an estimated probability that the word will be spoken using the acoustic feature associated with the codeword. The probability values are stored in an output probability table indexed by each codeword and by each word in a vocabulary. The output probability table is arranged to allow compression of the probability of values associated with each codeword based on other probability values associated with the same codeword, thereby compressing the stored output probability. By compressing the probability values associated with each codeword separate from the probability values associated with other codewords, the speech recognition system can recognize spoken words without having to decompress the entire output probability table. In a preferred embodiment, additional compression is achieved by quantizing the probability values into 16 buckets with an equal number of probability values in each bucket. By quantizing the probability values into buckets, additional redundancy is added to the output probability table, which allows the output probability table to be additionally compressed.

209 Citations

26 Claims

1. A data compression method for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:
- receiving a plurality of spoken training utterances; and
  
  convening the spoken training utterances into a stored output probability table, the convening step including;
  
  creating a predetermined number of codewords based on the spoken training utterances, each codeword representing an acoustic feature of the spoken training utterances;
  
  creating the hidden Markov models based on the spoken training utterances, each hidden Markov model being created by steps that include associating a probability value with each codeword, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword; and
  
  compressing at least some of the probability values associated with a selected one of the codewords based on others of the probability values associated with the selected codeword, thereby compressing the stored output probability table.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The data compression method of claim 1 wherein the compressing step includes:
    - counting a number of probability values in a run of consecutive identical probability values associated with the selected codeword; and
      
      encoding the run of consecutive identical probability values with an identification code that identifies the consecutive identical probability values and a count code that identifies how many consecutive identical probability values are in the run.
  - 3. The data compression method of claim 1, further including quantizing the probability values into buckets before compressing the probability values associated with each codeword, each bucket representing a range of probability values.
  - 4. The data compression method of claim 3 wherein the quantizing step includes quantizing the probability values into 16 buckets.
  - 5. The data compression method of claim 3, further including dynamically adjusting the ranges of the buckets according to the number of probability values coming within each range.
  - 6. The data compression method of claim 1, further including:
    - inputting an input speech signal that represents a sound spoken by a user;
      
      convening the input speech signal into a string of input frames;
      
      associating one of the codewords with each input frame;
      
      for each input frame;
      
      decompressing only those probability values associated with a codeword of the stored output probability table that is identical to the codeword of the input frame; and
      
      computing a recognition score for each of the hidden Markov models using the decompressed probability values; and
      
      determining which hidden Markov model results in a highest recognition score based on the computing step, the acoustic utterance represented by the hidden Markov model with the highest recognition score being recognized as the spoken sound represented by the input speech signal.
  - 7. The data compression method of claim 1, wherein the hidden Markov models represent whole words of a language.

8. A data compression system for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:
- a receiver that receives a plurality of spoken training utterances;
  
  means for converting the spoken training utterances into an output probability table having a predetermined number of codewords, each codeword representing an acoustic feature of the spoken training utterances and each codeword being associated with a probability value for each hidden Markov model, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword;
  
  a memory that stores the output probability table; and
  
  a data processor that compresses at least some of the probability values associated with a selected one of the codewords based on others of the probability values associated with the selected codeword, thereby compressing the stored output probability table.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The data compression system of claim 8 wherein the data processor is adapted to compress the probability values using run-length encoding in which a run of consecutive identical probability values is stored as a code that includes an identification of the consecutive identical probability values and a count value that indicates how many consecutive identical probability values are in the run.
  - 10. The data compression system of claim 8, further including means for quantizing the probability values into buckets before compressing the probability values associated with each codeword, each bucket representing a range of probability values.
  - 11. The data compression system of claim 10 wherein the quantizing means includes means for quantizing the probability values into 16 buckets.
  - 12. The data compression system of claim 10, further including means for dynamically adjusting the ranges of the buckets according to the number of probability values coming within each range.
  - 13. The data compression system of claim 8, wherein the receiving means receives an input speech signal that represents a sound spoken by a user and the convening means includes means for convening the input speech signal into a string of input frames and means for associating one of the codewords with each input frame, the system further including recognizing means for recognizing the input speech signal as corresponding to one of the acoustic utterances represented by the hidden Markov models, the recognizing means including:
    - means for decompressing only those probability values associated with codewords of the stored output probability table that are identical to the codewords of the input frames;
      
      means for computing a recognition score for each of the hidden Markov models using the decompressed probability values; and
      
      means for determining which hidden Markov model results in a highest recognition score;
      
      the acoustic utterance represented by the hidden Markov model with the highest recognition score being recognized as the sound represented by the input speech signal.
  - 14. The data compression system of claim 8, wherein at least some of the hidden Markov models represent whole words of a language.
  - 15. The data compression system of claim 8 wherein a plurality of the plurality of hidden Markov models represent a same word in a vocabulary.

16. A computer storage medium encoded with a data structure for use by a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, the data structure comprising:
- a predetermined number of codewords, each codeword representing an acoustic feature; and
  
  a probability value associated with each codeword for each hidden Markov model, each probability value reflecting a probability that the acoustic utterance represented by the associated hidden Markov model will be spoken using the acoustic feature represented by the associated codeword, at least some of the probability values associated with a selected codeword being in a compressed form that is based on others of the probability values associated with the selected codeword, the compressed form being such that the probability values associated with the selected codeword can be decompressed by the computerized speech recognizer without decompressing the probability values associated with codewords other than the selected codeword;
  
  computer instructions stored on the computer storage medium, the computer instructions recognizing a sound spoken by a user, the sound spoken being represented by an input speech signal that includes a plurality of input frames, each input frame being associated with one of the codewords, the instructions including;
  
  instructions for decompressing only those probability values associated with the codewords that are identical to the codewords of the input frames without decompressing probability values associated with codewords not identical to the codewords of the input frames;
  
  instructions for computing a recognition score for each of the hidden Markov models using the decompressed probability values; and
  
  instructions for determining which hidden Markov model results in a highest recognition score and thereby recognizing the acoustic utterance represented by the hidden Markov model with the highest recognition score as being the spoken sound represented by the input speech signal.

17. A data compression method for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:
- receiving a plurality of spoken training utterances; and
  
  convening the spoken training utterances into an output probability table, the convening step including;
  
  creating a predetermined number of codewords based on the spoken training utterance, each codeword representing an acoustic feature of the spoken training utterances;
  
  creating the hidden Markov models based on the spoken training utterances, each hidden Markov model being created by steps that include associating a probability value with each codeword, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword; and
  
  compressing the probability values for each codeword separately from the probability values for other codewords, such that a selected probability value associated with a selected codeword can be decompressed without decompressing the probability values associated with a different, non-selected codeword.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The data compression method of claim 17 wherein the compressing step includes:
    - counting a number of probability values in a run of consecutive identical probability values associated with the selected codeword; and
      
      encoding the run of consecutive identical probability values with an identification code that identifies the consecutive identical probability values and a count code that identifies how many consecutive identical probability values are in the run.
  - 19. The data compression method of claim 17, further including quantizing the probability values into buckets before compressing the probability values associated with each codeword, each bucket representing a range of probability values.
  - 20. The data compression method of claim 19 wherein the quantizing step includes quantizing the probability values into 16 buckets.
  - 21. The data compression method of claim 19, further including dynamically adjusting the ranges of the buckets according to the number of probability values coming within each range.
  - 22. The data compression method of claim 17, further including:
    - inputting an input speech signal that represents a sound spoken by a user;
      
      converting the input speech signal into a string of input frames;
      
      associating one of the codewords with each input frame;
      
      for each input frame;
      
      decompressing only those probability values associated with a codeword of the stored output probability table that is identical to the codeword of the input frame; and
      
      computing a recognition score for each of the hidden Markov models using the decompressed probability values; and
      
      determining which hidden Markov model results in a highest recognition score based on the computing step, the acoustic utterance represented by the hidden Markov model with the highest recognition score being recognized as the sound represented by the input speech signal.
  - 23. The data compression method of claim 17, wherein the hidden Markov models represent whole words of a language.

24. A data compression method for a computerized speech recognizer having a plurality of utterance models, each utterance model representing a speech utterance, each utterance model being associated with a probability value for each of a plurality of acoustic features, each probability value reflecting a probability that the speech utterance represented by the associated utterance model will be spoken using the associated acoustic feature, each acoustic utterance being represented by a codeword, the method comprising:
- retrieving the probability values for all the utterance models for a selected codeword;
  
  encoding the retrieved probability values independently from the probability values for codewords other than the selected codeword, such that the encoded probability values for the selected codeword can be decoded without decoding the probability values associated with a different, non-selected codeword;
  
  storing the encoded probability values; and
  
  recognizing an input speech utterance using the stored encoded probability values.
- View Dependent Claims (25)
- - 25. The data compression method of claim 24 wherein the input speech utterance includes the acoustic feature represented by the selected codeword and the recognizing step includes decoding the probability values for the selected codeword without decoding the probability values associated with a different, non-selected codeword.

26. A speech recognition method for recognizing an input speech utterance spoken by a user, the method employing a computerized speech recognizer having a plurality of utterance models representing speech utterances, the computerized speech recognizer including a compressed output probability table having, for each utterance model, a probability value for each codeword in a set of codewords representing acoustic features, each probability value reflecting a probability that the speech utterance represented by the utterance model for the probability value will be spoken using the acoustic feature represented by the codeword for the probability value, at least some of the probability values for each codeword being compressed, the method comprising:
- inputting a speech signal representing the input speech utterance, the speech signal including a sequence of input frames, each input frame representing a sound of the input speech utterance;
  
  for each input frame;
  
  determining which of the codewords most closely matches the sound represented by the input frame;
  
  decompressing only those probability values for the codeword that is determined to most closely match the sound represented by the input frame without decompressing the probability values for the codewords that do not most closely match the sound represented by the input frame; and
  
  computing a recognition score for each of the utterance models using the decompressed probability values, the recognition score for each of the utterance models being updated with each input frame; and
  
  determining which utterance model results in a highest recognition score based on the computing step;
  
  the speech utterance represented by the utterance model with the highest recognition score being recognized as the input speech utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhang, Shenzhi, Huang, Xuedong
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Sartori, Michael A.

Application Number

US08/116,900
Time in Patent Office

1,341 Days
Field of Search

395/2, 395/2.29-2.32, 395/2.65, 395/2.1, 395/2.4, 395/2.64, 381/29-30, 381/34-35
US Class Current

704/256
CPC Class Codes

G10L 15/144   Training of HMMs

G10L 15/285   Memory allocation or algori...

G10L 25/15   the extracted parameters be...

Speech recognition system and method employing data compression

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

209 Citations

26 Claims

Specification

Use Cases

Quick Links

Others

Speech recognition system and method employing data compression

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

209 Citations

26 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others