Speech recognition system and method employing data compression
First Claim
1. A data compression method for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:
- receiving a plurality of spoken training utterances; and
convening the spoken training utterances into a stored output probability table, the convening step including;
creating a predetermined number of codewords based on the spoken training utterances, each codeword representing an acoustic feature of the spoken training utterances;
creating the hidden Markov models based on the spoken training utterances, each hidden Markov model being created by steps that include associating a probability value with each codeword, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword; and
compressing at least some of the probability values associated with a selected one of the codewords based on others of the probability values associated with the selected codeword, thereby compressing the stored output probability table.
2 Assignments
0 Petitions
Accused Products
Abstract
A data compression system greatly compresses the stored data used by a speech recognition system employing hidden Markov models (HMM). The speech recognition system vector quantizes the acoustic space spoken by humans by dividing it into a predetermined number of acoustic features that are stored as codewords in a vector quantization (output probability) table or codebook. For each spoken word, the speech recognition system calculates an output probability value for each codeword, the output probability value representing an estimated probability that the word will be spoken using the acoustic feature associated with the codeword. The probability values are stored in an output probability table indexed by each codeword and by each word in a vocabulary. The output probability table is arranged to allow compression of the probability of values associated with each codeword based on other probability values associated with the same codeword, thereby compressing the stored output probability. By compressing the probability values associated with each codeword separate from the probability values associated with other codewords, the speech recognition system can recognize spoken words without having to decompress the entire output probability table. In a preferred embodiment, additional compression is achieved by quantizing the probability values into 16 buckets with an equal number of probability values in each bucket. By quantizing the probability values into buckets, additional redundancy is added to the output probability table, which allows the output probability table to be additionally compressed.
209 Citations
26 Claims
-
1. A data compression method for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:
-
receiving a plurality of spoken training utterances; and convening the spoken training utterances into a stored output probability table, the convening step including; creating a predetermined number of codewords based on the spoken training utterances, each codeword representing an acoustic feature of the spoken training utterances; creating the hidden Markov models based on the spoken training utterances, each hidden Markov model being created by steps that include associating a probability value with each codeword, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword; and compressing at least some of the probability values associated with a selected one of the codewords based on others of the probability values associated with the selected codeword, thereby compressing the stored output probability table. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A data compression system for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:
-
a receiver that receives a plurality of spoken training utterances; means for converting the spoken training utterances into an output probability table having a predetermined number of codewords, each codeword representing an acoustic feature of the spoken training utterances and each codeword being associated with a probability value for each hidden Markov model, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword; a memory that stores the output probability table; and a data processor that compresses at least some of the probability values associated with a selected one of the codewords based on others of the probability values associated with the selected codeword, thereby compressing the stored output probability table. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer storage medium encoded with a data structure for use by a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, the data structure comprising:
-
a predetermined number of codewords, each codeword representing an acoustic feature; and a probability value associated with each codeword for each hidden Markov model, each probability value reflecting a probability that the acoustic utterance represented by the associated hidden Markov model will be spoken using the acoustic feature represented by the associated codeword, at least some of the probability values associated with a selected codeword being in a compressed form that is based on others of the probability values associated with the selected codeword, the compressed form being such that the probability values associated with the selected codeword can be decompressed by the computerized speech recognizer without decompressing the probability values associated with codewords other than the selected codeword; computer instructions stored on the computer storage medium, the computer instructions recognizing a sound spoken by a user, the sound spoken being represented by an input speech signal that includes a plurality of input frames, each input frame being associated with one of the codewords, the instructions including; instructions for decompressing only those probability values associated with the codewords that are identical to the codewords of the input frames without decompressing probability values associated with codewords not identical to the codewords of the input frames; instructions for computing a recognition score for each of the hidden Markov models using the decompressed probability values; and instructions for determining which hidden Markov model results in a highest recognition score and thereby recognizing the acoustic utterance represented by the hidden Markov model with the highest recognition score as being the spoken sound represented by the input speech signal.
-
-
17. A data compression method for a computerized speech recognizer having a plurality of hidden Markov models representing acoustic utterances, comprising:
-
receiving a plurality of spoken training utterances; and convening the spoken training utterances into an output probability table, the convening step including; creating a predetermined number of codewords based on the spoken training utterance, each codeword representing an acoustic feature of the spoken training utterances; creating the hidden Markov models based on the spoken training utterances, each hidden Markov model being created by steps that include associating a probability value with each codeword, each probability value reflecting a probability that the acoustic utterance represented by the hidden Markov model will be spoken using the acoustic feature represented by the associated codeword; and compressing the probability values for each codeword separately from the probability values for other codewords, such that a selected probability value associated with a selected codeword can be decompressed without decompressing the probability values associated with a different, non-selected codeword. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A data compression method for a computerized speech recognizer having a plurality of utterance models, each utterance model representing a speech utterance, each utterance model being associated with a probability value for each of a plurality of acoustic features, each probability value reflecting a probability that the speech utterance represented by the associated utterance model will be spoken using the associated acoustic feature, each acoustic utterance being represented by a codeword, the method comprising:
-
retrieving the probability values for all the utterance models for a selected codeword; encoding the retrieved probability values independently from the probability values for codewords other than the selected codeword, such that the encoded probability values for the selected codeword can be decoded without decoding the probability values associated with a different, non-selected codeword; storing the encoded probability values; and recognizing an input speech utterance using the stored encoded probability values. - View Dependent Claims (25)
-
-
26. A speech recognition method for recognizing an input speech utterance spoken by a user, the method employing a computerized speech recognizer having a plurality of utterance models representing speech utterances, the computerized speech recognizer including a compressed output probability table having, for each utterance model, a probability value for each codeword in a set of codewords representing acoustic features, each probability value reflecting a probability that the speech utterance represented by the utterance model for the probability value will be spoken using the acoustic feature represented by the codeword for the probability value, at least some of the probability values for each codeword being compressed, the method comprising:
-
inputting a speech signal representing the input speech utterance, the speech signal including a sequence of input frames, each input frame representing a sound of the input speech utterance; for each input frame; determining which of the codewords most closely matches the sound represented by the input frame; decompressing only those probability values for the codeword that is determined to most closely match the sound represented by the input frame without decompressing the probability values for the codewords that do not most closely match the sound represented by the input frame; and computing a recognition score for each of the utterance models using the decompressed probability values, the recognition score for each of the utterance models being updated with each input frame; and determining which utterance model results in a highest recognition score based on the computing step;
the speech utterance represented by the utterance model with the highest recognition score being recognized as the input speech utterance.
-
Specification