Method and apparatus for speech recognition based on subsyllable spellings
First Claim
1. An apparatus for speech recognition comprising:
- means for sampling a speaker'"'"'s speech and for providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech;
means, coupled to the sampling means, for identifying cohesive speech segments from the speech data sample segments and for assigning frames of subsyllables to the cohesive segments, wherein the cohesive segments correspond to intervals of stable vocoids, changing vocoids, frication, and silence in the speech data sample segments, each cohesive segment corresponds to at least one respective frame, and each frame comprises at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment;
means, coupled to the identifying and assigning means, for locating the subsyllables in a first lookup to table mapping sequences of subsyllables into syllables;
means for combining syllables located by the locating means into words by locating words in a lookup table mapping sequences of syllables into words; and
means, coupled to the combining means, for checking the conformance of sequences of the words to a set of predetermined checking rules relating the words to one another and for reporting a recognition result based on the checked conformance of the sequences of the words.
0 Assignments
0 Petitions
Accused Products
Abstract
In a digital computer, a method for speech recognition includes steps of sampling a speaker'"'"'s speech and providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech. Cohesive speech segments, which correspond to intervals of stable vocoids, changing vocoids, frication, and silence, are identified from the speech data sample segments, and are assigned frames of subsyllables. Each cohesive segment corresponds to at least one respective frame, and each frame includes at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment. The subsyllables are located in a first lookup table mapping sequences of subsyllables into syllables, and the syllables are combined into words by locating words in another lookup table. The conformance of sequences of the words to a set of predetermined checking rules is checked, and a recognition result is reported. Apparatus implementing the method are also disclosed.
73 Citations
19 Claims
-
1. An apparatus for speech recognition comprising:
-
means for sampling a speaker'"'"'s speech and for providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech; means, coupled to the sampling means, for identifying cohesive speech segments from the speech data sample segments and for assigning frames of subsyllables to the cohesive segments, wherein the cohesive segments correspond to intervals of stable vocoids, changing vocoids, frication, and silence in the speech data sample segments, each cohesive segment corresponds to at least one respective frame, and each frame comprises at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment; means, coupled to the identifying and assigning means, for locating the subsyllables in a first lookup to table mapping sequences of subsyllables into syllables; means for combining syllables located by the locating means into words by locating words in a lookup table mapping sequences of syllables into words; and means, coupled to the combining means, for checking the conformance of sequences of the words to a set of predetermined checking rules relating the words to one another and for reporting a recognition result based on the checked conformance of the sequences of the words. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus for speech recognition comprising:
-
means for sampling a speaker'"'"'s speech and for providing a digitized speech signal comprising speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech; first means, coupled to the sampling means, for generating from the speech data sample segments speech features; second means, coupled to the sampling means, for generating an amplitude change signal based on changes in the speech data sample segments; third means, coupled to the sampling means, for generating a pitch dispersion signal based on the speech data sample segments; fourth means, coupled to the first means and responsive to the speech features, for generating weighted phonetic estimates of the speech data sample segments; fifth means, coupled to the second, third, and fourth means, for producing sequences of frames of subsyllables in response to the amplitude change signal, the pitch dispersion signal, and the weighted phonetic estimates; means, coupled to the fifth means, for locating the subsyllables in a first lookup table mapping sequences of subsyllables into syllables; means for combining syllables located by the locating means into words by locating words in a lookup table mapping sequences of syllables into words; means for checking the conformance of sequences of the words produced by the combining means to a set of predetermined checking rules relating the words to one another and for reporting a recognition result that depends on the checked conformance of the sequences of the words; and control means for coordinating the foregoing means, wherein the control means includes a path data area and coordinates the foregoing means by tracking a plurality of weighted parallel paths representing states of the fifth means, the combining means, and the checking means by storing the paths in a working path and a path long in the path data area, and the weighted parallel paths include a plurality of subpaths representing states of the fifth means and the combining means. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. In a digital computer, a method for speech recognition comprising the steps of:
-
sampling a speaker'"'"'s speech and providing speech data sample segments of predetermined length at predetermined sampling intervals based on changes in energy in the speech; identifying cohesive speech segments from the speech data sample segments; assigning frames of subsyllables to the cohesive segments, wherein the cohesive segments correspond to intervals of stable vocoids, changing vocoids, friction, and silence in the speech data sample segments, each cohesive segment corresponds to at least one respective frame, and each frame comprises at least one of a plurality of subsyllables that characterizes predetermined gross and fine phonetic attributes of the respective cohesive segment; locating the subsyllables of assigned frames in a first lookup table mapping sequences of subsyllables into syllables; combining syllables located by the locating step into words by locating words in a lookup table mapping sequences of syllables into words; and checking the conformance of sequences of the words produced by the combining step to a set of predetermined checking rules relating the words to one another; and reporting a recognition result based on the checked conformance of the sequences of the words. - View Dependent Claims (16, 17, 18, 19)
-
Specification