Real time computer speech recognition system

US 4,852,170 A
Filed: 12/18/1986
Issued: 07/25/1989
Est. Priority Date: 12/18/1986
Status: Expired due to Fees

First Claim

Patent Images

1. A real time speech recognition system comprising:

means for receiving audio speech signals and for converting them into corresponding electrical signals having a predetermined maximum frequency of interest;

analog-to-digital conversion means for sampling said signals at a rate at least twice as high as said maximum frequency;

spectrum analyzer means for accepting sets of samples from said analog-to-digital converter extending over a time interval of between about two milliseconds and about sixteen milliseconds, and for providing a digital spectrum analysis of each of said sets of samples;

means for logically analyzing said sets of samples, and for classifying the series of samples into silence, transitions, and phonemes of at least the following classes;

(1) voiced stops, (2) unvoiced stops, (3) unvoiced fricatives, (4) vowels, semi-vowels, and voiced fricatives, and (5) transitions;

means for mathematically analyzing the relationships between the formants of the classified phonemes to uniquely identify successive phonemes;

said analyzing means including means for defining phoneme regions wherein, in the graphical analysis of a first one of said formants plotted against an other one of said formants at least some selected boundaries of the defined regions extend over both a range of first format frequencies and a range of said other formant frequencies, and for determining the coordinates defined by the formants of each phoneme, and the region in which such coordinates fall, thereby identifying each phoneme;

means for forming sequences of continuous strings of phonemes, eliminating transitions and silences;

means for translating the strings of phonemes into the words of a language;

said means for translating strings of phonemes into a language including means for parsing the phoneme string, including (1) determining alternative correct possible words from the phoneme string, (2) eliminating those alternatives which Yield subsequent non-words in the following phoneme string, and (3) selecting the remaining alternative word; and

means for printing out text corresponding to the translated words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech may be analyzed digitally and recognized in real time by a system which includes a spectrum analyzer which determines the frequency content of successive segments of speech. Each speech segment is logically analyzed to identify the class of phonemes of which it is a part, and then the frequency spectrum of the segment is analyzed to uniquely identify the specific phoneme within the type. Sequences of phonemes with transitions excluded can then be compactly stored, transmitted to remote locations, synthesized into voice and translated logically into English or other natural language.

147 Citations

24 Claims

1. A real time speech recognition system comprising:
- means for receiving audio speech signals and for converting them into corresponding electrical signals having a predetermined maximum frequency of interest;
  
  analog-to-digital conversion means for sampling said signals at a rate at least twice as high as said maximum frequency;
  
  spectrum analyzer means for accepting sets of samples from said analog-to-digital converter extending over a time interval of between about two milliseconds and about sixteen milliseconds, and for providing a digital spectrum analysis of each of said sets of samples;
  
  means for logically analyzing said sets of samples, and for classifying the series of samples into silence, transitions, and phonemes of at least the following classes;
  
  (1) voiced stops, (2) unvoiced stops, (3) unvoiced fricatives, (4) vowels, semi-vowels, and voiced fricatives, and (5) transitions;
  
  means for mathematically analyzing the relationships between the formants of the classified phonemes to uniquely identify successive phonemes;
  
  said analyzing means including means for defining phoneme regions wherein, in the graphical analysis of a first one of said formants plotted against an other one of said formants at least some selected boundaries of the defined regions extend over both a range of first format frequencies and a range of said other formant frequencies, and for determining the coordinates defined by the formants of each phoneme, and the region in which such coordinates fall, thereby identifying each phoneme;
  
  means for forming sequences of continuous strings of phonemes, eliminating transitions and silences;
  
  means for translating the strings of phonemes into the words of a language;
  
  said means for translating strings of phonemes into a language including means for parsing the phoneme string, including (1) determining alternative correct possible words from the phoneme string, (2) eliminating those alternatives which Yield subsequent non-words in the following phoneme string, and (3) selecting the remaining alternative word; and
  
  means for printing out text corresponding to the translated words.
- View Dependent Claims (2, 3, 4)
- - 2. A system as defined in claim 1 further comprising low pass filter means for limiting the maximum frequency to about 3,000 to 6,000 Hz.
  - 3. A system as defined in claim 1 wherein said time interval is approximately from 2 milliseconds to 4 milliseconds.
  - 4. A system as defined in claim 1 wherein said sampling rate is between about 8,000 and 32,000 samples per second.

5. A method for recognizing the phonemes in speech comprising steps of:
- converting speech into a corresponding non-audio signal usable as input to a frequency analyzer;
  
  producing amplitude versus frequency spectra of said signal during successive predetermined time intervals;
  
  analyzing said spectra and classifying the series of spectra into transitions, silence and groups of phonemes;
  
  determining the frequencies of the formants in the said spectra;
  
  uniquely identifying successive phonemes from the relationships between the formants for each of said groups of phonemes; and
  
  said analyzing means including means for defining phoneme regions wherein, in the graphical analysis of a first one of said formants plotted against an other one of said formants, at least some selected boundaries of the defined regions extend over both a range of first formant frequencies and a range of said other formant frequencies, and for determining the coordinates defined by the formants of each phoneme, and the region in which such coordinates fall, thereby identifying each phoneme.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 6. A method as defined in claim 5 wherein said speech conversion includes the reduction in amplitude of frequencies above a predetermined maximum frequency.
  - 7. A method as defined in claim 5 wherein said speech conversion includes processing by an analog-to-digital converter with a predetermined sampling rate.
  - 8. A method as defined in claim 7 wherein said sampling occurs at a rate between 4,000 and 64,000 samples per second.
  - 9. A method as defined in claim 5 including the step of reducing unwanted background which may be present in said signals.
  - 10. A method as defined in claim 5 wherein said predetermined time interval for the production of spectra is approximately 1 to 16 milliseconds in duration.
  - 11. A method as defined in claim 5 which also includes the step of producing a string of successive phonemes.
  - 12. A method as defined in claim 11 wherein said method also includes the step of producing indications of silence.
  - 13. A method as defined in claim 12 wherein said string of phonemes is encoded, for storage, transmission or further processing.
  - 14. A method as defined in claim 12 including the step of synthesizing a voice or speech output from said string of phonemes.
  - 15. A method as defined in claim 12 including forming said string of phonemes into a stream of words by a phonetic dictionary translator.
  - 16. A method as defined in claim 15 including the step of translating said stream of words into sentences in a natural language.

17. A real time system for recognizing the phonemes in speech, comprising:
- means for converting speech into a corresponding non-audio signal usable as input to a frequency analyzer;
  
  frequency analyzer means for producing amplitude- versus frequency spectra of said signal during successive predetermined time intervals;
  
  means for analyzing said spectra and classifying the series of spectra into transitions, silence and groups of phonemes;
  
  means for determining the frequencies of the formants in the said specta;
  
  means for uniquely identifying successive phonemes from the relationships between the formants for each of said groups of phonemes;
  
  means for translating the resultant strings of phonemes into a language; and
  
  said means for translating strings of phonemes into a language including means for parsing the phoneme string, including (1) determining alternative possible words from the phoneme string, (2) eliminating those alternatives which yield subsequent non-words in the subsequent phoneme string, and (3) selecting the remaining alternative word.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. A system as defined in claim 17 including low pass filter means for reducing the amplitude of frequencies above a predetermined maximum frequency.
  - 19. A system as defined in claim 17 wherein said system includes an analog-to-digital converter with a predetermined sampling rate.
  - 20. A method as defined in claim 19 wherein said sampling rate is between 4,000 and 64,000 samples per second.
  - 21. A method as defined in claim 17 wherein said predetermined time interval is approximately 1 to 16 milliseconds in duration.
  - 22. A system as defined in claim 17 which also includes means for producing a string of successive phonemes.
  - 23. A system as defined in claim 22 wherein said system includes means for producing indications of silence.
  - 24. A system as defined in claim 23 wherein said system includes means for synthesizing a voice or speech output from said string of phonemes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R & D Associates
Original Assignee
R & D Associates
Inventors
Bordeaux, Theodore A.
Primary Examiner(s)
Harkcom, Gary V.
Assistant Examiner(s)
Knepper, David D.

Application Number

US06/944,468
Time in Patent Office

950 Days
Field of Search

381/39-50, 381/52
US Class Current

704/277
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Real time computer speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

147 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Real time computer speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

147 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links