Real time computer speech recognition system
First Claim
Patent Images
1. A real time speech recognition system comprising:
- means for receiving audio speech signals and for converting them into corresponding electrical signals having a predetermined maximum frequency of interest;
analog-to-digital conversion means for sampling said signals at a rate at least twice as high as said maximum frequency;
spectrum analyzer means for accepting sets of samples from said analog-to-digital converter extending over a time interval of between about two milliseconds and about sixteen milliseconds, and for providing a digital spectrum analysis of each of said sets of samples;
means for logically analyzing said sets of samples, and for classifying the series of samples into silence, transitions, and phonemes of at least the following classes;
(1) voiced stops, (2) unvoiced stops, (3) unvoiced fricatives, (4) vowels, semi-vowels, and voiced fricatives, and (5) transitions;
means for mathematically analyzing the relationships between the formants of the classified phonemes to uniquely identify successive phonemes;
said analyzing means including means for defining phoneme regions wherein, in the graphical analysis of a first one of said formants plotted against an other one of said formants at least some selected boundaries of the defined regions extend over both a range of first format frequencies and a range of said other formant frequencies, and for determining the coordinates defined by the formants of each phoneme, and the region in which such coordinates fall, thereby identifying each phoneme;
means for forming sequences of continuous strings of phonemes, eliminating transitions and silences;
means for translating the strings of phonemes into the words of a language;
said means for translating strings of phonemes into a language including means for parsing the phoneme string, including (1) determining alternative correct possible words from the phoneme string, (2) eliminating those alternatives which Yield subsequent non-words in the following phoneme string, and (3) selecting the remaining alternative word; and
means for printing out text corresponding to the translated words.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech may be analyzed digitally and recognized in real time by a system which includes a spectrum analyzer which determines the frequency content of successive segments of speech. Each speech segment is logically analyzed to identify the class of phonemes of which it is a part, and then the frequency spectrum of the segment is analyzed to uniquely identify the specific phoneme within the type. Sequences of phonemes with transitions excluded can then be compactly stored, transmitted to remote locations, synthesized into voice and translated logically into English or other natural language.
147 Citations
24 Claims
-
1. A real time speech recognition system comprising:
-
means for receiving audio speech signals and for converting them into corresponding electrical signals having a predetermined maximum frequency of interest; analog-to-digital conversion means for sampling said signals at a rate at least twice as high as said maximum frequency; spectrum analyzer means for accepting sets of samples from said analog-to-digital converter extending over a time interval of between about two milliseconds and about sixteen milliseconds, and for providing a digital spectrum analysis of each of said sets of samples; means for logically analyzing said sets of samples, and for classifying the series of samples into silence, transitions, and phonemes of at least the following classes; (1) voiced stops, (2) unvoiced stops, (3) unvoiced fricatives, (4) vowels, semi-vowels, and voiced fricatives, and (5) transitions; means for mathematically analyzing the relationships between the formants of the classified phonemes to uniquely identify successive phonemes; said analyzing means including means for defining phoneme regions wherein, in the graphical analysis of a first one of said formants plotted against an other one of said formants at least some selected boundaries of the defined regions extend over both a range of first format frequencies and a range of said other formant frequencies, and for determining the coordinates defined by the formants of each phoneme, and the region in which such coordinates fall, thereby identifying each phoneme; means for forming sequences of continuous strings of phonemes, eliminating transitions and silences; means for translating the strings of phonemes into the words of a language; said means for translating strings of phonemes into a language including means for parsing the phoneme string, including (1) determining alternative correct possible words from the phoneme string, (2) eliminating those alternatives which Yield subsequent non-words in the following phoneme string, and (3) selecting the remaining alternative word; and means for printing out text corresponding to the translated words. - View Dependent Claims (2, 3, 4)
-
-
5. A method for recognizing the phonemes in speech comprising steps of:
-
converting speech into a corresponding non-audio signal usable as input to a frequency analyzer; producing amplitude versus frequency spectra of said signal during successive predetermined time intervals; analyzing said spectra and classifying the series of spectra into transitions, silence and groups of phonemes; determining the frequencies of the formants in the said spectra; uniquely identifying successive phonemes from the relationships between the formants for each of said groups of phonemes; and said analyzing means including means for defining phoneme regions wherein, in the graphical analysis of a first one of said formants plotted against an other one of said formants, at least some selected boundaries of the defined regions extend over both a range of first formant frequencies and a range of said other formant frequencies, and for determining the coordinates defined by the formants of each phoneme, and the region in which such coordinates fall, thereby identifying each phoneme. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A real time system for recognizing the phonemes in speech, comprising:
-
means for converting speech into a corresponding non-audio signal usable as input to a frequency analyzer; frequency analyzer means for producing amplitude- versus frequency spectra of said signal during successive predetermined time intervals; means for analyzing said spectra and classifying the series of spectra into transitions, silence and groups of phonemes; means for determining the frequencies of the formants in the said specta; means for uniquely identifying successive phonemes from the relationships between the formants for each of said groups of phonemes; means for translating the resultant strings of phonemes into a language; and said means for translating strings of phonemes into a language including means for parsing the phoneme string, including (1) determining alternative possible words from the phoneme string, (2) eliminating those alternatives which yield subsequent non-words in the subsequent phoneme string, and (3) selecting the remaining alternative word. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification