Speech recognition device for controlling a machine

US 3,946,157 A
Filed: 08/09/1974
Issued: 03/23/1976
Est. Priority Date: 08/18/1971
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition apparatus for controlling a machine, said apparatus receiving words, said words being classified in a reduced plurality of three through twelve phoneme classes, said apparatus including:

a. input means for receiving said words;

b. first means responsive to said words indicating short-time energy values of said words, said first means producing a first output signal representing rising temporal energy flanks (slopes), nearly constant energy levels and zero levels of said words;

c. second means responsive to said words and producing a second output signal representing a steepness of said rising temporal energy flanks exceeding a determined limit value;

d. first logic meansi. responsive to said first output signal segmenting said words into phonetic elements, indicated by both the rising temporal energy flanks and the ends of the nearly constant energy levels and indicating an end of each word only by a zero level which exceeds a determined value of time duration producing a third output signal,ii. responsive to said second output signal separating a class of the plosive phonetic elements from the class of the fricative phonetic elements and producing a fourth output signal,e. second logic means detecting a sequence of the occurrence of said phonetic elements within one word; and

f. output means for controlling a machine as a result of the detected words.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In an apparatus for speech recognition, intended for controlling machines, spoken words are synthesized from two to twelve phoneme classes. Phonemes are recognized by analysis of speech sounds, measuring and comparing energy and time-rate of change of energy in certain frequency bands. Words are recognized by further logic means analyzing phoneme classes.

One particular feature determines the plosive class "T" versus the fricative class "S" by measuring energy rise-time.

60 Citations

View as Search Results

16 Claims

1. A speech recognition apparatus for controlling a machine, said apparatus receiving words, said words being classified in a reduced plurality of three through twelve phoneme classes, said apparatus including:
- a. input means for receiving said words;
  
  b. first means responsive to said words indicating short-time energy values of said words, said first means producing a first output signal representing rising temporal energy flanks (slopes), nearly constant energy levels and zero levels of said words;
  
  c. second means responsive to said words and producing a second output signal representing a steepness of said rising temporal energy flanks exceeding a determined limit value;
  
  d. first logic meansi. responsive to said first output signal segmenting said words into phonetic elements, indicated by both the rising temporal energy flanks and the ends of the nearly constant energy levels and indicating an end of each word only by a zero level which exceeds a determined value of time duration producing a third output signal,ii. responsive to said second output signal separating a class of the plosive phonetic elements from the class of the fricative phonetic elements and producing a fourth output signal,e. second logic means detecting a sequence of the occurrence of said phonetic elements within one word; and
  
  f. output means for controlling a machine as a result of the detected words.
- View Dependent Claims (2, 3)
- - 2. A speech recognition apparatus as defined in claim 1, wherein said first and second logic means includes circuit means responsive to said zero levels of said first output signal to determine an end of a word by exceeding the time duration value in an area between 0.2 sec. and 0.5 sec., and to determine an end of a phonetic element by dropping below the time duration value in an area between 0.2 sec. and 0.5 sec.
  - 3. A speech recognition apparatus as defined in claim 2, wherein said circuits responding to said zero levels of said first output signal determine an end of a group of a plurality of words by exceeding the time duration value in an area between 2 sec. and 5 sec. and to determine an end of a word by falling down below the time duration value in an area between 2 sec. and 5 sec.

4. A speech recognition system for controlling a machine, said system receiving words, said words being classified in a reduced plurality of three through twelve phoneme classes, said system including:
- a. input means for receiving said words;
  
  b. first means responsive to said words indicating short-time energy values of said words, said first means producing a first output signal representing rising temporal energy flanks, nearly constant energy levels and zero levels of said words;
  
  c. second means responsive to said words and producing a second output signal representing a steepness of said rising temporal energy flanks exceeding a determined limit value;
  
  d. third means responsive to said words producing at least three output signals representing said short-time energy values distributed in at least three frequency ranges;
  
  e. first logic meansi. responsive to said first output signal segmenting said words into phonetic elements, indicated by both the rising temporal energy flanks and the ends of the nearly constant energy levels, and indicating an end of each word only by a zero level which exceeds a determined value of time duration producing a third output signal,ii. responsive to said second output signal separating a class of the plosive phonetic elements from the class of the fricative phonetic elements and producing a fourth output signal,iii. responsive to said three output signals for detecting the plosive class, the fricative class and at least one vowel class and producing a fifth, sixth and seventh output signal;
  
  f. second logic means detecting a sequence of the occurrence of said phonetic elements within one word; and
  
  g. output means for controlling a machine as result of the detected words.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 5. A speech recognition apparatus related to claim 4, wherein said third means embody additional circuit means to determine a class of nasal phonetic elements.
  - 6. A speech recognition apparatus as defined in claim 4, wherein said first logic means include a circuit means responsive to zero levels of said first output signal indicating the time duration of the silences between all phonetic elements, said circuit means delivering a signal to eliminate a noise from said phonetic elements when said silence before a non plosive phonetic element exceeds a determined time duration value or when said silence before a plosive phonetic element drops below said determined time duration value.
  - 7. A speech recognition apparatus as defined in claim 4, wherein said second logic means includes circuit means for segmenting a composed word in its elementary words, each of them composed of a single vowel or a single consonant followed by a single vowel, said segmentation being operated by said end of each vowel indicated by said first logic means.
  - 8. A speech recognition apparatus as defined in claim 4, wherein said second logic means includes circuit means for recognizing an instruction word having an end with a consonant followed by a silence or including two adjacent consonants.
  - 9. A speech recognition apparatus as defined in claim 4, wherein said second logic means includes circuit means responsive to two following and adjacent vowels, the second vowel belonging to a selected vowel class recognizing a special elementary word for avoiding a repetition of the same phonetic element.
  - 10. A speech recognition apparatus as defined in claim 4, wherein said second logic means includes circuit means responsive to an initial vowel of a selected vowel class recognizing a special elementary word for avoiding a repetition of the same phonetic element.
  - 11. A speech recognition apparatus as defined in claim 4, wherein said first, second and third means includes a plurality of amplitude compressors consisting of one backward loop for controlling both the backward amplifier and a forward amplifier.
  - 12. A speech recognition apparatus as defined in claim 4, wherein said first logic means includes a first matrix for recognizing both three classes of consonants and three classes of vowels.
  - 13. A speech recognition apparatus as defined in claim 4, wherein said first logic means includes a second matrix for recognizing both four classes of consonants and four classes of vowels.
  - 14. A speech recognition apparatus as defined in claim 4, wherein said first logic means includes a third matrix for recognizing subclasses of consonants and vowels.

15. A speech recognition system for controlling a machine, said system receiving elementary words, each of which are constituted by a voiced or whispered consonant followed by a vowel, said consonant being chosen among two phoneme classes, a first class of which has an energy evolution of sudden initial transition, a second class of which has an energy evolution of soft transition, said system including:
- a. input means for receiving said words and shaping the frequency spectrums of said words;
  
  b. first means responsive to said shaped frequency spectrums of said words and producing a first output signal representing a temporal energy evolution of said words within a general frequency range;
  
  c. second means responsive to said shaped frequency of said vowels and producing a second output signal representing the rising steepness of the temporal energy variations within a high frequency range;
  
  d. first logic meansi. responsive to said first output signal to produce a third output signal which indicates segmenting said elementary words;
  
  ii. responsive to said second and third output signals producing a fourth output signal indicating the sudden or soft initial transition of said elementary words; and
  
  e. second logic means responsive to both the third and the fourth output signals for counting combinations of at least two classes of said elementary words.

16. A speech recognition system for controlling a machine, said system receiving elementary words each of which are constituted by one vowel preceded by one consonant, said vowel being chosen among two phoneme classes, one of which class has high vowels and the other class has low vowels, said consonant being chosen among two phoneme classes, a first class of which has an energy evolution of sudden initial transition, a second class of which has an energy evolution of soft initial transition, said system including:
- a. input means receiving said words and shaping the frequency spectrums of said words;
  
  b. first means responsive to said shaped frequency spectrums of said words and producing a first output signal representing a temporal energy evolution of said words within a general frequency range;
  
  c. second means responsive to said shaped frequency of said vowels and producing a second output signal representing the rising steepness of the temporal energy variations within a high frequency range;
  
  d. first logic meansi. responsive to said first output signal to produce a third output signal which indicates segmenting said elementary words;
  
  ii. responsive to said second and third output signals producing a fourth output signal indicating the sudden or soft initial transition of said elementary words; and
  
  e. second logic means responsive to both the third and the fourth output signals for counting combinations of at least two classes of said elementary words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Jean Albert Dreyfus
Original Assignee
Jean Albert Dreyfus
Inventors
Dreyfus, Jean Albert
Primary Examiner(s)
Claffy, Kathleen H.
Assistant Examiner(s)
Kemeny, E. S.

Application Number

US05/496,326
Time in Patent Office

592 Days
Field of Search

179/1 SA, 179/1 SM
US Class Current

704/254
CPC Class Codes

G10L 15/10 using distance or distortio...

G10L 2015/025 Phonemes, fenemes or fenone...

Speech recognition device for controlling a machine

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition device for controlling a machine

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links