Speech recognition device for controlling a machine
First Claim
Patent Images
1. A speech recognition apparatus for controlling a machine, said apparatus receiving words, said words being classified in a reduced plurality of three through twelve phoneme classes, said apparatus including:
- a. input means for receiving said words;
b. first means responsive to said words indicating short-time energy values of said words, said first means producing a first output signal representing rising temporal energy flanks (slopes), nearly constant energy levels and zero levels of said words;
c. second means responsive to said words and producing a second output signal representing a steepness of said rising temporal energy flanks exceeding a determined limit value;
d. first logic meansi. responsive to said first output signal segmenting said words into phonetic elements, indicated by both the rising temporal energy flanks and the ends of the nearly constant energy levels and indicating an end of each word only by a zero level which exceeds a determined value of time duration producing a third output signal,ii. responsive to said second output signal separating a class of the plosive phonetic elements from the class of the fricative phonetic elements and producing a fourth output signal,e. second logic means detecting a sequence of the occurrence of said phonetic elements within one word; and
f. output means for controlling a machine as a result of the detected words.
0 Assignments
0 Petitions
Accused Products
Abstract
In an apparatus for speech recognition, intended for controlling machines, spoken words are synthesized from two to twelve phoneme classes. Phonemes are recognized by analysis of speech sounds, measuring and comparing energy and time-rate of change of energy in certain frequency bands. Words are recognized by further logic means analyzing phoneme classes.
One particular feature determines the plosive class "T" versus the fricative class "S" by measuring energy rise-time.
60 Citations
16 Claims
-
1. A speech recognition apparatus for controlling a machine, said apparatus receiving words, said words being classified in a reduced plurality of three through twelve phoneme classes, said apparatus including:
-
a. input means for receiving said words; b. first means responsive to said words indicating short-time energy values of said words, said first means producing a first output signal representing rising temporal energy flanks (slopes), nearly constant energy levels and zero levels of said words; c. second means responsive to said words and producing a second output signal representing a steepness of said rising temporal energy flanks exceeding a determined limit value; d. first logic means i. responsive to said first output signal segmenting said words into phonetic elements, indicated by both the rising temporal energy flanks and the ends of the nearly constant energy levels and indicating an end of each word only by a zero level which exceeds a determined value of time duration producing a third output signal, ii. responsive to said second output signal separating a class of the plosive phonetic elements from the class of the fricative phonetic elements and producing a fourth output signal, e. second logic means detecting a sequence of the occurrence of said phonetic elements within one word; and f. output means for controlling a machine as a result of the detected words. - View Dependent Claims (2, 3)
-
-
4. A speech recognition system for controlling a machine, said system receiving words, said words being classified in a reduced plurality of three through twelve phoneme classes, said system including:
-
a. input means for receiving said words; b. first means responsive to said words indicating short-time energy values of said words, said first means producing a first output signal representing rising temporal energy flanks, nearly constant energy levels and zero levels of said words; c. second means responsive to said words and producing a second output signal representing a steepness of said rising temporal energy flanks exceeding a determined limit value; d. third means responsive to said words producing at least three output signals representing said short-time energy values distributed in at least three frequency ranges; e. first logic means i. responsive to said first output signal segmenting said words into phonetic elements, indicated by both the rising temporal energy flanks and the ends of the nearly constant energy levels, and indicating an end of each word only by a zero level which exceeds a determined value of time duration producing a third output signal, ii. responsive to said second output signal separating a class of the plosive phonetic elements from the class of the fricative phonetic elements and producing a fourth output signal, iii. responsive to said three output signals for detecting the plosive class, the fricative class and at least one vowel class and producing a fifth, sixth and seventh output signal; f. second logic means detecting a sequence of the occurrence of said phonetic elements within one word; and g. output means for controlling a machine as result of the detected words. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A speech recognition system for controlling a machine, said system receiving elementary words, each of which are constituted by a voiced or whispered consonant followed by a vowel, said consonant being chosen among two phoneme classes, a first class of which has an energy evolution of sudden initial transition, a second class of which has an energy evolution of soft transition, said system including:
-
a. input means for receiving said words and shaping the frequency spectrums of said words; b. first means responsive to said shaped frequency spectrums of said words and producing a first output signal representing a temporal energy evolution of said words within a general frequency range; c. second means responsive to said shaped frequency of said vowels and producing a second output signal representing the rising steepness of the temporal energy variations within a high frequency range; d. first logic means i. responsive to said first output signal to produce a third output signal which indicates segmenting said elementary words; ii. responsive to said second and third output signals producing a fourth output signal indicating the sudden or soft initial transition of said elementary words; and e. second logic means responsive to both the third and the fourth output signals for counting combinations of at least two classes of said elementary words.
-
-
16. A speech recognition system for controlling a machine, said system receiving elementary words each of which are constituted by one vowel preceded by one consonant, said vowel being chosen among two phoneme classes, one of which class has high vowels and the other class has low vowels, said consonant being chosen among two phoneme classes, a first class of which has an energy evolution of sudden initial transition, a second class of which has an energy evolution of soft initial transition, said system including:
-
a. input means receiving said words and shaping the frequency spectrums of said words; b. first means responsive to said shaped frequency spectrums of said words and producing a first output signal representing a temporal energy evolution of said words within a general frequency range; c. second means responsive to said shaped frequency of said vowels and producing a second output signal representing the rising steepness of the temporal energy variations within a high frequency range; d. first logic means i. responsive to said first output signal to produce a third output signal which indicates segmenting said elementary words; ii. responsive to said second and third output signals producing a fourth output signal indicating the sudden or soft initial transition of said elementary words; and e. second logic means responsive to both the third and the fourth output signals for counting combinations of at least two classes of said elementary words.
-
Specification