Spelling speech recognition apparatus and method for communications

US 6,304,844 B1
Filed: 03/30/2000
Issued: 10/16/2001
Est. Priority Date: 03/30/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system comprising:

microphone means for receiving acoustic waves and converting the acoustic waves into electronic signals;

front-end signal processing means, coupled to said microphone means, for processing the electronic signals to generate parametric representations of the electronic signals, including preemphasizer means for spectrally flattening the electronic signals generated by said microphone means;

frame-blocking means, coupled to said preemphasizer means, for blocking the electronic signals into frames of N samples with adjacent frames separated by M samples;

windowing means, coupled to said frame-blocking means, for windowing each frame;

autocorrelation means, coupled to said windowing means, for autocorrelating the frames;

cepstral coefficient generating means, coupled to said autocorrelation means, for converting each frame into cepstral coefficients; and

tapered windowing means, coupled to said cepstral coefficient generating means, for weighting the cepstral coefficients, thereby generating parametric representations of the sound waves;

pronunciation database storage means for storing a plurality of parametric representations of letter pronunciations;

letter similarity comparator means, coupled to said front-end signal processing means and to said pronunciation database storage means, for comparing the parametric representation of the electronic signals with said plurality of parametric representations of letter pronunciations, and generating a first sequence of associations between the parametric representation of the electronic signals and said plurality of parametric representations of letter pronunciations responsive to predetermined criteria;

vocabulary database storage means for storing a plurality of parametric representations of word pronunciations;

word similarity comparator means, coupled to said letter similarity comparator and to said vocabulary database storage means, for comparing an aggregated plurality of parametric representations of letter pronunciations with said plurality of parametric representations of word pronunciations, and generating a second sequence of associations between at least one of said aggregated plurality of parametric representations of the letter pronunciations with at least one of said plurality of parametric representations of word pronunciations responsive to predetermined criteria; and

display means, coupled to said word similarity comparator means, for displaying said first and second sequences of associations.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An accurate speech recognition system capable of rapidly processing greater varieties of words and operable in many different devices, but without the computational power and memory requirements, high power consumption, complex operating system, high costs, and weight of traditional systems. The utilization of individual letter utterances to transmit words allows voice information transfer for both person-to-person and person-to-machine communication for mobile phones, PDAs, and other communication devices. This invention is an apparatus and method for a speech recognition system comprising a microphone, front-end signal processor for generating parametric representations of speech input signals, a pronunciation database, a letter similarity comparator for comparing the parametric representation of the input signals with the parametric representations of letter pronunciations, and generating a sequence of associations between the input speech and the letters in the pronunciation database, a vocabulary database, a word similarity comparator for comparing an aggregated plurality of the letters with the words in the vocabulary database and generating a sequence of associations between them, and a display for displaying the selected letters and words for confirmation.

284 Citations

22 Claims

1. A speech recognition system comprising:
- microphone means for receiving acoustic waves and converting the acoustic waves into electronic signals;
  
  front-end signal processing means, coupled to said microphone means, for processing the electronic signals to generate parametric representations of the electronic signals, including preemphasizer means for spectrally flattening the electronic signals generated by said microphone means;
  
  frame-blocking means, coupled to said preemphasizer means, for blocking the electronic signals into frames of N samples with adjacent frames separated by M samples;
  
  windowing means, coupled to said frame-blocking means, for windowing each frame;
  
  autocorrelation means, coupled to said windowing means, for autocorrelating the frames;
  
  cepstral coefficient generating means, coupled to said autocorrelation means, for converting each frame into cepstral coefficients; and
  
  tapered windowing means, coupled to said cepstral coefficient generating means, for weighting the cepstral coefficients, thereby generating parametric representations of the sound waves;
  
  pronunciation database storage means for storing a plurality of parametric representations of letter pronunciations;
  
  letter similarity comparator means, coupled to said front-end signal processing means and to said pronunciation database storage means, for comparing the parametric representation of the electronic signals with said plurality of parametric representations of letter pronunciations, and generating a first sequence of associations between the parametric representation of the electronic signals and said plurality of parametric representations of letter pronunciations responsive to predetermined criteria;
  
  vocabulary database storage means for storing a plurality of parametric representations of word pronunciations;
  
  word similarity comparator means, coupled to said letter similarity comparator and to said vocabulary database storage means, for comparing an aggregated plurality of parametric representations of letter pronunciations with said plurality of parametric representations of word pronunciations, and generating a second sequence of associations between at least one of said aggregated plurality of parametric representations of the letter pronunciations with at least one of said plurality of parametric representations of word pronunciations responsive to predetermined criteria; and
  
  display means, coupled to said word similarity comparator means, for displaying said first and second sequences of associations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The speech recognition system of claim 1 wherein said front-end signal processing means further comprises temporal differentiating means, coupled to said tapered windowing means, for generating a first time derivative of the cepstral coefficients.
  - 3. The speech recognition system of claim 1 wherein said front-end signal processing means further comprises temporal differentiating means, coupled to said tapered windowing means, for generating a second time derivative of the cepstral coefficients.
  - 4. The speech recognition system of claim 1 wherein said letter similarity comparator means comprises:
5. The speech recognition system of claim 4 wherein said dynamic time warper means comprises minimization means for determining the minimum cepstral distances between the parametric representation of the electronic signals and said plurality of parametric representations of the letter pronunciations stored in said pronunciation database storage means.
6. The speech recognition system of claim 1 wherein said plurality of parametric representations of letter pronunciations stored in said pronunciation database storage means include the pronunciation of individual characters of the Chinese language and said plurality of parametric representations of word pronunciations stored in said vocabulary database storage means include the pronunciation of aggregated word strings of the Chinese language.
7. The speech recognition system of claim 1 wherein said plurality of parametric representations of letter pronunciations stored in said pronunciation database storage means include the pronunciation of individual characters of the Korean language and said plurality of parametric representations of word pronunciations stored in said vocabulary database storage means include the pronunciation of aggregated word strings of the Korean language.
8. The speech recognition system of claim 1 wherein said plurality of parametric representations of letter pronunciations stored in said pronunciation database storage means include the pronunciation of individual characters of the Japanese language and said plurality of parametric representations of word pronunciations stored in said vocabulary database storage means include the pronunciation of aggregated word strings of the Japanese language.
9. The speech recognition system of claim 1 wherein said plurality of parametric representations of letter pronunciations stored in said pronunciation database storage means include the pronunciation of individual characters of the French language and said plurality of parametric representations of word pronunciations stored in said vocabulary database storage means include the pronunciation of aggregated word strings of the French language.

10. A letter similarity comparator comprising:
- means for receiving electronic signals parametric representations;
  
  pronunciation database storage means for storing a plurality of letter pronunciation parametric representations;
  
  letter calibration means, coupled to said receiving means and to said pronunciation database storage means, for calibrating the electronic signals parametric representations with said plurality of letter pronunciation parametric representations stored in said pronunciation database storage means;
  
  dynamic time warper means for performing dynamic time warping on the electronic signals parametric representations and said plurality of letter pronunciation parametric representations stored in said pronunciation database storage means;
  
  distortion calculation means, coupled to said letter calibration means and to said dynamic time warper means, for calculating a distortion between the electronic signals parametric representations and said plurality of letter pronunciation parametric representations stored in said pronunciation database storage means;
  
  scoring means, coupled to said distortion calculation means, for assigning a score to said distortion responsive to predetermined criteria; and
  
  selection means, coupled to said scoring means, for selecting at least one of said plurality of letter pronunciation parametric representations having the lowest distortion.

11. An electronic communication device comprising:
- a microphone for receiving sound signals and generating electronic signals therefrom;
  
  a coder-decoder, coupled to said microphone, for coding and decoding the electronic signals;
  
  a signal processor, coupled to said coder-decoder, for processing the electronic signals thereby generating parametric representations of the electronic signals;
  
  a database storage unit, coupled to said signal processor, for storing data and having a first sector therein for storing a plurality of letter pronunciation parametric representations and a second sector therein for storing a plurality of word pronunciation parametric representations;
  
  a first comparator, coupled to said signal processor and to said database storage unit, for comparing parametric representations of the electronic signals with said plurality of letter pronunciation parametric representations in said first sector of said database storage unit;
  
  a first selector, coupled to said first comparator, for selecting at least one of said plurality of letter pronunciation parametric representations responsive to predetermined criteria;
  
  a second comparator, coupled to said signal processor and to said database storage unit, for comparing aggregated parametric representations of letter pronunciations with said plurality of word pronunciation parametric representations in said second sector of said database storage unit;
  
  a second selector, coupled to said second comparator, for selecting at least one of said plurality of word pronunciation parametric representations responsive to predetermined criteria; and
  
  a display, coupled to said first and second selectors, for displaying said at least one of said plurality of selected letter pronunciation parametric representations and for displaying said at least one of said plurality of word pronunciation parametric representations.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The electronic communication device of claim 11 wherein said plurality of letter pronunciation parametric representations stored in said first sector of said database storage unit are grouped responsive to similarity of parametric representation.
  - 13. The electronic communication device of claim 11 wherein said first comparator calibrates the parametric representations of the electronic signals responsive to said plurality of letter pronunciation parametric representations in said first sector of said database storage unit.
  - 14. The electronic communication device of claim 11 wherein said digital signal processor calculates cepstral coefficients to generate the parametric representations of the electronic signals, the plurality of letter pronunciation parametric representations, and said plurality of word pronunciation parametric representations.
  - 15. The electronic communication device of claim 11 wherein said first comparator utilizes dynamic time warping to generate comparisons of the parametric representations of the electronic signals with said plurality of letter pronunciation parametric representations.
  - 16. The electronic communication device of claim 15 wherein said first comparator utilizes cepstral distances to compare the parametric representations of the electronic signals with said plurality of letter pronunciation parametric representations.
  - 17. The electronic communication device of claim 11 wherein said second comparator utilizes dynamic time warping to generate comparisons of said aggregated plurality of letter pronunciation parametric representations with said plurality of word pronunciation parametric representations.
  - 18. The electronic communication device of claim 17 wherein said second comparator utilizes letter pronunciation sequences to compare the parametric representations of said aggregated plurality of letter pronunciations with said plurality of word pronunciation parametric representations stored in said database storage unit.
  - 19. The electronic communication device of claim 17 wherein said second comparator utilizes cepstral distances to compare the parametric representations of said aggregated plurality of letter pronunciations with said plurality of word pronunciation parametric representations stored in said database storage unit.

20. A method for recognizing speech sound signals, comprising the steps of:
- forming a stored database of letter and word sounds including the steps of,(a) parameterizing a plurality of letter sounds;
  
  (b) storing said plurality of parameterized letter sounds;
  
  (c) parameterizing a plurality of word sounds;
  
  (d) storing said plurality of parameterized of word sounds;
  
  performing speech recognition of input speech including the steps of,(e) receiving sound waves;
  
  (f) converting the sound waves into electronic signals;
  
  (g) parameterizing the electronic signals;
  
  (h) comparing said parameterized electronic signals with said stored plurality of parameterized letter sounds responsive to calibrating said plurality of parameterized electronic signals with said plurality of parameterized letter sounds responsive to a predetermined calibration method;
  
  (i) selecting at least one of said stored plurality of parameterized letter sounds responsive to predetermined parameter similarity criteria;
  
  (j) displaying said selected at least one of said stored plurality of parameterized letter sounds;
  
  (k) aggregating said selected at least one of said stored plurality of parameterized letter sounds to form a parameterized word;
  
  (l) comparing said parameterized word with said stored plurality of parameterized word sounds;
  
  (m) selecting at least one of said stored plurality of parameterized word sounds responsive to predetermined parameter similarity criteria; and
  
  (n) displaying said selected at least one of said stored plurality of parameterized word sounds.

21. A method for recognizing speech sound signals, comprising the steps of:
- forming a stored database of letter and word sounds including the steps of,(a) speaking a plurality of letter sounds;
  
  (b) distinguishing whether the speaker is male or female;
  
  (c) parameterizing said plurality of letter sounds;
  
  (d) storing said plurality of parameterized letter sounds;
  
  (e) parameterizing a plurality of word sounds;
  
  (f) storing said plurality of parameterized of word sounds;
  
  performing speech recognition of input speech including the steps of,(g) receiving sound waves;
  
  (h) converting the sound waves into electronic signals;
  
  (i) parameterizing the electronic signals;
  
  (j) comparing said parameterized electronic signals with said stored plurality of parameterized letter sounds;
  
  (k) selecting at least one of said stored plurality of parameterized letter sounds responsive to predetermined parameter similarity criteria;
  
  (l) displaying said selected at least one of said stored plurality of parame-terized letter sounds;
  
  (m) aggregating said selected at least one of said stored plurality of parameterized letter sounds to form a parameterized word, (n) comparing said parameterized word with said stored plurality of parameterized word sounds;
  
  (o) selecting at least one of said stored plurality of parameterized word sounds responsive to predetermined parameter similarity criteria; and
  
  (p) displaying said selected at least one of said stored plurality of parameterized word sounds.

22. A method for recognizing speech sound signals, comprising the steps of:
- forming a stored database of letter and word sounds including the steps of,(a) speaking a plurality of letter sounds;
  
  (b) distinguishing the endpoints of each letter sound responsive to the spoken letter sounds, thereby distinguishing substantially clear spoken letter sounds;
  
  (c) parameterizing said plurality of letter sounds;
  
  (d) storing said plurality of parameterized letter sounds;
  
  (e) parameterizing a plurality of word sounds;
  
  (f) storing said plurality of parameterized of word sounds;
  
  performing, speech recognition of input speech including the steps of,(g) receiving sound waves;
  
  (h) converting the sound waves into electronic signals;
  
  (i) parameterizing the electronic signals;
  
  (i) comparing said parameterized electronic signals with said stored plurality of parameterized letter sounds;
  
  (k) selecting at least one of said stored plurality of parameterized letter sounds responsive to predetermined parameter similarity criteria;
  
  (l) displaying said selected at least one of said stored plurality of parameterized letter sounds;
  
  (m) aggregating said selected at least one of said stored plurality of parameterized letter sounds to form a parameterized word;
  
  (n) comparing said parameterized word with said stored plurality of parameterized word sounds;
  
  (o) selecting at least one of said stored plurality of parameterized word sounds responsive to predetermined parameter similarity criteria; and
  
  (p) displaying said selected at least one of said stored plurality of parameterized word sounds.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbaltek Incorporated
Original Assignee
Verbaltek Incorporated
Inventors
Kim, Yoon, Pan, James, Chang, Josephine, Chen, Juinn-Yan
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/538,657
Time in Patent Office

565 Days
Field of Search

704/257, 704/254, 704/255, 704/251, 704/252
US Class Current

704/257
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/086   Recognition of spelled words

Spelling speech recognition apparatus and method for communications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

284 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Spelling speech recognition apparatus and method for communications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

284 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links