Method and apparatus for speech recognition based on subsyllable spellings

US 5,208,897 A
Filed: 09/28/1990
Issued: 05/04/1993
Est. Priority Date: 08/21/1990
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for speech recognition comprising:

means for sampling a speaker'"'"'s speech and for providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech;

means, coupled to the sampling means, for identifying cohesive speech segments from the speech data sample segments and for assigning frames of subsyllables to the cohesive segments, wherein the cohesive segments correspond to intervals of stable vocoids, changing vocoids, frication, and silence in the speech data sample segments, each cohesive segment corresponds to at least one respective frame, and each frame comprises at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment;

means, coupled to the identifying and assigning means, for locating the subsyllables in a first lookup to table mapping sequences of subsyllables into syllables;

means for combining syllables located by the locating means into words by locating words in a lookup table mapping sequences of syllables into words; and

means, coupled to the combining means, for checking the conformance of sequences of the words to a set of predetermined checking rules relating the words to one another and for reporting a recognition result based on the checked conformance of the sequences of the words.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a digital computer, a method for speech recognition includes steps of sampling a speaker'"'"'s speech and providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech. Cohesive speech segments, which correspond to intervals of stable vocoids, changing vocoids, frication, and silence, are identified from the speech data sample segments, and are assigned frames of subsyllables. Each cohesive segment corresponds to at least one respective frame, and each frame includes at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment. The subsyllables are located in a first lookup table mapping sequences of subsyllables into syllables, and the syllables are combined into words by locating words in another lookup table. The conformance of sequences of the words to a set of predetermined checking rules is checked, and a recognition result is reported. Apparatus implementing the method are also disclosed.

73 Citations

View as Search Results

19 Claims

1. An apparatus for speech recognition comprising:
- means for sampling a speaker'"'"'s speech and for providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech;
  
  means, coupled to the sampling means, for identifying cohesive speech segments from the speech data sample segments and for assigning frames of subsyllables to the cohesive segments, wherein the cohesive segments correspond to intervals of stable vocoids, changing vocoids, frication, and silence in the speech data sample segments, each cohesive segment corresponds to at least one respective frame, and each frame comprises at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment;
  
  means, coupled to the identifying and assigning means, for locating the subsyllables in a first lookup to table mapping sequences of subsyllables into syllables;
  
  means for combining syllables located by the locating means into words by locating words in a lookup table mapping sequences of syllables into words; and
  
  means, coupled to the combining means, for checking the conformance of sequences of the words to a set of predetermined checking rules relating the words to one another and for reporting a recognition result based on the checked conformance of the sequences of the words.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The apparatus of claim 1, wherein the identifying and assigning means comprises means for generating an amplitude change signal, a pitch dispersion signal, and raw phonetic estimates from the speech data sample segments for identifying the cohesive segments.
  - 3. The apparatus of claim 1, wherein the gross phonetic attributes include silence, frication, stable vowel, and change, and the fine phonetic attributes include duration of silence, strength or weakness of friction, phonetic identity of a vowel, and rising or falling amplitude of a change interval, and the gross and fine phonetic attributes are based on articulatory features of high/low, front/back, frication, nasality, and retroflexion in the speaker'"'"'s speech.
  - 4. The apparatus of claim 1, wherein at least some of the frames each comprise at least two subsyllables, and each such subsyllable includes a respective estimate of the likelihood that the subsyllable accurately represents the respective speech data sample segment.
  - 5. The apparatus of claim 1, wherein the identifying and assigning means comprisesfirst means for generating speech features and a pitch value signal from the speech data sample segments;
    - means for determining amplitude changes of the speech data sample segments and for generating an amplitude change signal;
      
      means for determining pitch dispersion of the speech data sample segments and for generating a pitch dispersion signal;
      
      second means, responsive to the first means, for generating weighted phonetic estimates of the speech data sample segments; and
      
      means for determining the speaker'"'"'s sex in response to the pitch value signal and for assigning frames of subsyllables in response to the amplitude change signal, the pitch dispersion signal, and the speaker'"'"'s sex.

6. An apparatus for speech recognition comprising:
- means for sampling a speaker'"'"'s speech and for providing a digitized speech signal comprising speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech;
  
  first means, coupled to the sampling means, for generating from the speech data sample segments speech features;
  
  second means, coupled to the sampling means, for generating an amplitude change signal based on changes in the speech data sample segments;
  
  third means, coupled to the sampling means, for generating a pitch dispersion signal based on the speech data sample segments;
  
  fourth means, coupled to the first means and responsive to the speech features, for generating weighted phonetic estimates of the speech data sample segments;
  
  fifth means, coupled to the second, third, and fourth means, for producing sequences of frames of subsyllables in response to the amplitude change signal, the pitch dispersion signal, and the weighted phonetic estimates;
  
  means, coupled to the fifth means, for locating the subsyllables in a first lookup table mapping sequences of subsyllables into syllables;
  
  means for combining syllables located by the locating means into words by locating words in a lookup table mapping sequences of syllables into words;
  
  means for checking the conformance of sequences of the words produced by the combining means to a set of predetermined checking rules relating the words to one another and for reporting a recognition result that depends on the checked conformance of the sequences of the words; and
  
  control means for coordinating the foregoing means, wherein the control means includes a path data area and coordinates the foregoing means by tracking a plurality of weighted parallel paths representing states of the fifth means, the combining means, and the checking means by storing the paths in a working path and a path long in the path data area, and the weighted parallel paths include a plurality of subpaths representing states of the fifth means and the combining means.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
- - 7. The apparatus of claim 6, wherein the speech features include high/low, front/back, frication, nasality, and retroflexion in the speaker'"'"'s speech.
  - 8. The apparatus of claim 6, wherein the first means also generates a pitch signal, the fifth means is coupled to the first means and determines the speaker'"'"'s sex in response to the pitch signal, and the fourth means generates the weighted phonetic estimates based also on the speaker'"'"'s sex determined by the fifth means.
  - 9. The apparatus of claim 6, wherein the fifth means determines whether a speech sound is voiced or unvoiced based on at least the pitch dispersion signal.
  - 10. The apparatus of claim 8, wherein the fifth means divides the digitized speech signal into cohesive segments of stable vocoids, changing vocoids, frication, and silence based on the amplitude change signal, the pitch dispersion signal, and the weighted phonetic estimates, and each cohesive segment corresponds to at least one respective frame, and each frame comprises at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment.
  - 11. The apparatus of claim 10, wherein the fifth means comprisessixth means, responsive to the speech features and speaker'"'"'s sex, for determining average phonetic estimates of speech data sample segments;
    - seventh means, responsive to the sixth means, for providing primary and secondary phonetic estimates for stable vocoids;
      
      eighth means, responsive to the amplitude change signal, for producing frames based on subtle amplitude changes; and
      
      ninth means, responsive to the amplitude change signal, for providing frames each having at least one of a plurality of modifiers that describe rising amplitudes.
  - 12. The apparatus of claim 11, wherein the ninth means inserts frames before frames representing change intervals that include rising amplitudes.
  - 13. The apparatus of claim 6, wherein the fifth means includes means for repeating frames that correspond to at least some sounds of the speaker'"'"'s speech and that represent phonetic entities that are sharable between conjoined words in continuous speech.
  - 14. The apparatus of claim 13, wherein the frames are repeated for silence, frication, and nasals.

15. In a digital computer, a method for speech recognition comprising the steps of:
- sampling a speaker'"'"'s speech and providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech;
  
  identifying cohesive speech segments from the speech data sample segments;
  
  assigning frames of subsyllables to the cohesive segments, wherein the cohesive segments correspond to intervals of stable vocoids, changing vocoids, friction, and silence in the speech data sample segments, each cohesive segment corresponds to at least one respective frame, and each frame comprises at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment;
  
  locating the subsyllables of assigned frames in a first lookup table mapping sequences of subsyllables into syllables;
  
  combining syllables located by the locating step into words by locating words in a lookup table mapping sequences of syllables into words; and
  
  checking the conformance of sequences of the words produced by the combining step to a set of predetermined checking rules relating the words to one another; and
  
  reporting a recognition result based on the checked conformance of the sequences of the words.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method of claim 15, wherein the identifying step comprises the step of generating an amplitude change signal, a pitch dispersion signal, and raw phonetic estimates from the speech data sample segments for identifying the cohesive segments.
  - 17. The method of claim 15, wherein the gross phonetic attributes include silence, frication, stable vowel, and change, and the fine phonetic attributes include duration of silence, strength or weakness of frication, phonetic identity of a vowel, and rising or falling amplitude of a change interval, and the gross and fine phonetic attributes are based on articulatory features of high/low, front/back, frication, nasality, and retroflexion in the speaker'"'"'s speech.
  - 18. The method of claim 15, wherein at least some of the frames each comprise at least two subsyllables, and each such subsyllable includes a respective estimate of the likelihood that the subsyllable accurately represents the respective speech data sample segment.
  - 19. The method of claim 15, wherein the identifying and assigning steps comprise the steps of:
    - generating speech features and a pitch value signal from the speech data sample segments;
      
      determining amplitude changes of the speech data sample segments and generating an amplitude change signal;
      
      determining pitch dispersion of the speech data sample segments and generating a pitch dispersion signal;
      
      generating weighted phonetic estimates of the speech data sample segments; and
      
      determining the speaker'"'"'s six in response to the pitch value signal; and
      
      frames of subsyllables are assigned in response to the amplitude change signal, the pitch dispersion signal, and the speaker'"'"'s sex.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emerson & Stern Associates, Inc.
Original Assignee
Emerson & Stern Associates, Inc.
Inventors
Hutchins, Sandra E.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/589,646
Time in Patent Office

949 Days
Field of Search

381/41-46, 395/2
US Class Current

704/200
CPC Class Codes

G10L 15/187 Phonemic context, e.g. pron...

Method and apparatus for speech recognition based on subsyllable spellings

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

73 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speech recognition based on subsyllable spellings

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

73 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links