Speech recognition training for small hardware devices

US 6,463,413 B1
Filed: 04/20/1999
Issued: 10/08/2002
Est. Priority Date: 04/20/1999
Status: Expired due to Term

First Claim

Patent Images

1. A speech processing system for constructing speech recognition reference models, comprising:

a speech recognizer residing on a first computing device;

said speech recognizer receiving speech training data and processing the speech training data into an intermediate representation of the speech training data, said speech recognizer further being operative to communicate the intermediate representation to a second computing device;

a speech model server residing on said second computing device, said second computing device being interconnected via a network to said first computing device;

said speech model server receiving the intermediate representation of the speech training data and generating a speech reference model using the intermediate representation, said speech model server further being operative to communicate the speech reference model to said first computing device; and

a lexicon coupled to said speech recognizer for storing the speech reference model on said first computing device.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A distributed speech processing system for constructing speech recognition reference models that are to be used by a speech recognizer in a small hardware device, such as a personal digital assistant or cellular telephone. The speech processing system includes a speech recognizer residing on a first computing device and a speech model server residing on a second computing device. The speech recognizer receives speech training data and processes it into an intermediate representation of the speech training data. The intermediate representation is then communicated to the speech model server. The speech model server generates a speech reference model by using the intermediate representation of the speech training data and then communicates the speech reference model back to the first computing device for storage in a lexicon associated with the speech recognizer.

Citations

32 Claims

1. A speech processing system for constructing speech recognition reference models, comprising:
- a speech recognizer residing on a first computing device;
  
  said speech recognizer receiving speech training data and processing the speech training data into an intermediate representation of the speech training data, said speech recognizer further being operative to communicate the intermediate representation to a second computing device;
  
  a speech model server residing on said second computing device, said second computing device being interconnected via a network to said first computing device;
  
  said speech model server receiving the intermediate representation of the speech training data and generating a speech reference model using the intermediate representation, said speech model server further being operative to communicate the speech reference model to said first computing device; and
  
  a lexicon coupled to said speech recognizer for storing the speech reference model on said first computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech processing system of claim 1 wherein said speech recognizer receives alphanumeric text that serves as the speech training data and said intermediate representation of the speech training data being a sequence of symbols from said alphanumeric text.
  - 3. The speech processing system of claim 1 wherein said speech recognizer captures audio data that serves as the speech training data and digitizes the audio data into said intermediate representation of the speech training data.
  - 4. The speech processing system of claim 1 wherein said speech recognizer captures audio data that serves as the speech training data and converts the audio data into a vector of parameters that serves as said intermediate representation of the speech data, where the parameters are indicative of the short term speech spectral shape of said audio data.
  - 5. The speech processing system of claim 4 wherein said vector of parameters is further defined as either pulse code modulation (PCM), μ
    - -law encoded PCM, filter bank energies, line spectral frequencies, or cepstral coefficients.
  - 6. The speech processing system of claim 1 wherein said speech model server further comprises a speech model database for storing speaker-independent speech reference models, said speech model server being operative to retrieve a speech reference model from said speech model database that corresponds to the intermediate representation of said speech training data received from said speech recognizer.
  - 7. The speech processing system of claim 1 wherein said speech model server further comprises:
8. The speech processing system of claim 4 wherein said speech model server further comprises:
- a Hidden Markov Model (HMM) database for storing phone model speech data corresponding to a plurality of phonemes; and
  
  a model trainer coupled to said HMM database for decoding the vector of parameters into a phonetic transcription of the audio data, whereby said phonetic transcription serves as said speech reference model.
9. The speech processing system of claim 1 wherein said speech recognizer captures at least two training repetitions of audio data that serves as the speech training data and converts the audio data into a sequence of vectors of parameters that serves as said intermediate representation of the speech training data, where each vector corresponds to a training repetition and the parameters are indicative of the short term speech spectral shape of said audio data.
10. The speech processing system of claim 9 wherein said speech model server being operative to determine a reference vector from the sequence of vectors, align each vector in the sequence of vectors to the reference vector, determine a mean and a variance of each parameter in the reference vector computed over the values in the aligned vectors, thereby constructing said speech reference model from the sequence of vectors.

11. A distributed speech processing system for supporting applications that reside on a personal digital assistant (PDA) device, comprising:
- an input means for capturing speech training data at the PDA;
  
  a speech recognizer coupled to said input means and receptive of speech training data from said input means;
  
  said speech recognizer being operative to process the speech training data into an intermediate representation of the speech training data and communicate the intermediate representation to a second computing device;
  
  a speech model server residing on said second computing device, said second computing device being interconnected via a network to the PDA;
  
  said speech model server receiving the intermediate representation of the speech training data and generating a speech reference model using the intermediate representation, said speech model server further being operative to communicate the speech reference model to said first computing device; and
  
  a lexicon coupled to said speech recognizer for storing the speech reference model on the PDA.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 12. The distributed speech processing system of claim 11 wherein said input means is further defined as:
13. The distributed speech processing system of claim 12 wherein said speech recognizer segments the alphanumeric data into a sequence of symbols which serves as the intermediate representation of the speech training data.
14. The distributed speech processing system of claim 11 wherein said speech model server further comprises a speech model database for storing speaker-independent speech reference models, said speech model server being operative to retrieve a speech reference model from said speech model database that corresponds to the intermediate representation of said speech training data received from said speech recognizer.
15. The distributed speech processing system of claim 11 wherein said speech model server further comprises:
- a phoneticizer receptive of the intermediate representation for producing a plurality of phonetic transcriptions; and
  
  a model trainer coupled to said phoneticizer for building said speech reference model based on said plurality of phonetic transcriptions.
16. The distributed speech processing system of claim 11 wherein said input means is further defined as a microphone for capturing audio data that serves as speech training data.
17. The distributed speech processing system of claim 16 wherein said speech recognizer converts the audio data into a digital input signal and translates the digital input signal into a vector of parameters which serves as the intermediate representation of the speech training data, said parameters being indicative of the short term speech spectral shape of said audio data.
18. The distributed speech processing system of claim 17 wherein said vector of parameters is further defined as either pulse code modulation (PCM), μ
- -law encoded PCM, filter bank energies, line spectral frequencies, or cepstral coefficients.
19. The distributed speech processing system of claim 11 wherein said speech model server further comprises:
- a Hidden Markov Model (HMM) database for storing phone model speech data corresponding to a plurality of phonemes; and
  
  a model trainer coupled to said HMM database for decoding said vector of parameters into a phonetic transcription of the audio data, whereby said phonetic transcription serves as said speech reference model.
20. The speech processing system of claim 11 wherein said speech recognizer captures at least two training repetitions of audio data that serves as the speech training data and converts the audio data into a sequence of vectors of parameters that serves as said intermediate representation of the speech training data, where each vector corresponds to a training repetition and the parameters are indicative of the short term speech spectral shape of said audio data.
21. The speech processing system of claim 20 wherein said speech model server being operative to determine a reference vector from the sequence of vectors, align each vector in the sequence of vectors to the reference vector, determine a mean and a variance of each parameter in the reference vector computed over the values in the aligned vectors, thereby constructing said speech reference model from the sequence of vectors.

22. A distributed speech processing system for supporting applications that reside on a cellular telephone handset device, comprising:
- an input means for capturing speech training data at the handset device;
  
  a speech recognizer coupled to said input means and receptive of speech training data from said input means;
  
  said speech recognizer being operative to process the speech training data into an intermediate representation of the speech training data and communicate the intermediate representation to a second computing device;
  
  a speech model server residing on said second computing device, said second computing device being interconnected via a network to the handset device;
  
  said speech model server receiving the intermediate representation of the speech training data and generating a speech reference model using the intermediate representation, said speech model server further being operative to communicate the speech reference model to said first computing device; and
  
  a lexicon coupled to said speech recognizer for storing the speech reference model on the handset device.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 23. The distributed speech processing system of claim 22 wherein said input means is further defined as a keypad for capturing alphanumeric data that serves as speech training data, such that said speech recognizer segments the alphanumeric data into a sequence of symbols which serves as the intermediate representation of the speech training data.
  - 24. The distributed speech processing system of claim 22 wherein said reference model server further comprises a speech model database for storing speaker-independent speech reference models, said reference model server being operative to retrieve a speech reference model from said speech model database that corresponds to the intermediate representation of said speech training data received from said speech recognizer.
  - 25. The distributed speech processing system of claim 22 wherein said speech model server further comprises:
26. The distributed speech processing system of claim 22 wherein said input means is further defined as a microphone for capturing audio data that serves as speech training data.
27. The distributed speech processing system of claim 26 wherein said speech recognizer converts the audio data into a digital input signal and translates the digital input signal into a vector of parameters which serves as the intermediate representation of the speech training data, said parameters being indicative of the short term speech spectral shape of said audio data.
28. The distributed speech processing system of claim 27 wherein said vector of parameters is further defined as either pulse code modulation (PCM), μ
- -law encoded PCM, filter bank energies, line spectral frequencies, or cepstral coefficients.
29. The distributed speech processing system of claim 22 wherein said speech model server further comprises:
- a Hidden Markov Model (HMM) database for storing phone model speech data corresponding to a plurality of phonemes; and
  
  a model trainer coupled to said HMM database for decoding said vector of parameters into a phonetic transcription of the audio data, whereby said phonetic transcription serves as said speech reference model.
30. The distributed speech processing system of claim 22 wherein said speech recognizer captures at least two training repetitions of audio data that serves as the speech training data and converts the audio data into a sequence of vectors of parameters that serves as said intermediate representation of the speech training data, where each vector corresponds to a training repetition and the parameters are indicative of the short term speech spectral shape of said audio data.
31. The distributed speech processing system of claim 30 wherein said speech model server being operative operative to determine a reference vector from the sequence of vectors, align each vector in the sequence of vectors to the reference vector, determine a mean and a variance of each parameter in the reference vector computed over the values in the aligned vectors, thereby constructing said speech reference model from the sequence of vectors.

32. A method of building speech reference models for use in a speech recognition system, comprising the steps of:
- collecting speech training data at a first computing device;
  
  processing the speech training data into an intermediate representation of the speech training data on said first computing device;
  
  communicating said intermediate representation of the speech training data to a second computing device, said second computing device interconnected via a network to said first computing device;
  
  creating a speech reference model from said intermediate representation at said second computing device; and
  
  communicating said speech reference model to the first computing device for use in the speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intertrust Technologies Corporation (Fidelio Acquisition Co LLC)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Applebaum, Ted H., Junqua, Jean-Claude
Primary Examiner(s)
Chawan, Vijay B

Application Number

US09/295,276
Time in Patent Office

1,267 Days
Field of Search

704/256, 704/245, 704/254, 704/243, 704/244, 704/255, 704/252, 704/253, 704/257, 704/270, 704/275
US Class Current

704/256.2
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/0638   Interactive procedures

Speech recognition training for small hardware devices

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition training for small hardware devices

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links