Speech recognition system
First Claim
1. A method for recognizing speech containing a plurality of significant phonemes, said method comprising the steps of:
- constructing a profile of a characteristic of each significant phoneme;
generating a difference profile for substantially each pair of significant phonemes by subtracting the profile of each significant phoneme from the profile of each other significant phoneme;
identifying adjacent sections of each difference profile which exceed positive and negative thresholds respectively;
computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profile;
constructing an equivalent profile of a phoneme of an unknown utterance; and
choosing the more likely phoneme of each phoneme pair based onthe relative areas in the identified sections of the profile of the unknown phoneme.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a system for recognizing speech in which a profile is constructed of a related characteristic of each significant phoneme in a language. A difference profile is generated for each pair of significant phonemes by subtracting the profile of each phoneme from the profile of each other phoneme. Adjacent sections of each difference profile which exceed positive and negative thresholds are identified. The likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profile is computed. An equivalent profile is constructed of a phoneme of an unknown utterance. The most likely phoneme of each phoneme pair is chosen based on the relative areas in the identified sections of the profile of the unknown phonemes.
-
Citations
55 Claims
-
1. A method for recognizing speech containing a plurality of significant phonemes, said method comprising the steps of:
-
constructing a profile of a characteristic of each significant phoneme; generating a difference profile for substantially each pair of significant phonemes by subtracting the profile of each significant phoneme from the profile of each other significant phoneme; identifying adjacent sections of each difference profile which exceed positive and negative thresholds respectively; computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profile; constructing an equivalent profile of a phoneme of an unknown utterance; and choosing the more likely phoneme of each phoneme pair based onthe relative areas in the identified sections of the profile of the unknown phoneme. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for recognizing speech containing a plurality of significant phonemes, said method comprising the steps of:
-
uttering each of a plurality of significant phonemes; detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of a single phoneme for each significant phoneme; deriving a plurality of energy spectra from the detected frequency distributions representing the energy present in each phoneme as a function of frequency; deriving at least a pair of sweep spectra from the detected frequency distributions for each phoneme representing the change of energy present in the phoneme from one frequency to another; generating a set of different profiles for each pair of significant phonemes by subtracting the corresponding profile of each significant phoneme from the corresponding profile of the other significant phoneme; identifying adjacent sections in the difference profiles of each set which exceed positive and negative thresholds respectively; computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profiles; constructing equivalent spectra for a phoneme of an unknown utterance; and choosing the most likely phoneme of each phoneme pair based on the relative area in the identified sections of the spectra of the unknown phoneme. - View Dependent Claims (10, 11, 12)
-
-
13. A method for processing a spoken utterance consisting of a sequence of phonemes said method comprising the steps of:
-
detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of a single phoneme; separating each frequency distribution into a plurality of frequency bandwidths; deriving at least one energy spectrum representing the energy present in the phoneme as a function of frequency for a plurality of frequency distributions coverting a sequency of time intervals; and deriving at least a pair of sweep spectra representing the change in energy present in the phoneme from one frequency to another as a function of time for a plurality of frequency distributions, whereby the spoken phoneme can be represented by the energy spectrum and the sweep spectra to facilitate speech recognition. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 35, 36, 37)
-
-
29. A method for processing a spoken utterance consisting of a sequence of phonemes, said method comprising the steps of:
-
detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of the phoneme; dividing the detected energy into a plurality of frequency ranges; applying an automatic gain control independently to the energy of each frequency range; separating the output of each automatic gain control into a plurality of frequency bandwidths; deriving a short time constant energy spectrum representing the energy present in the phoneme as a function of frequency for a plurality of frequency distributions covering a first sequence of time intervals; deriving a long time constant energy spectrum representing the energy present in the phoneme as a function of frequency for a plurality of frequency distributions covering a sequence of time intervals which is longer than the first time intervals; deriving short time constant upward and downward sweep spectra representing the change in energy present in the phoneme from a lower to a higher and a higher to a lower frequency respectively as a function of time for a plurality of frequency distributions covering a second time interval; and deriving long time constant upward and downward sweep spectra representing the change in energy present in the phoneme from a lower to a higher and a higher to a lower frequency respectively as a function of time for a plurality of frequency distributions covering a time interval longer than the second time interval, whereby the spoken phoneme can be represented by the energy spectrum and the sweep spectra to facilitate speech recognition. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
38. A method for processing a spoken utterance consisting of a sequence of phonemes, said method comprising the steps of:
-
detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of a single phoneme; passing the detected energy through a plurality of lowpass filters; amplifying the output of adjacent lowpass filters with difference amplifiers to separate the detected energy into a plurality of frequency bandwidths; passing the output of the difference amplifiers to respective upward and downward edge detectors to generate step functions of fixed amplitude and duration; summing the step functions of adjacent, overlapping sets of upward and downward edge detectors respectively to generate upward and downward sweep spectra; passing the output of adjacent sets of difference amplifiers through a plurality of summing amplifiers; and summing the outputs of the summing amplifiers to generate at least one energy spectrum, whereby the spoken phoneme can be represented by the energy spectrum and the sweep spectra to facilitate speech recognition. - View Dependent Claims (39, 40, 41, 42, 43, 44)
-
-
45. A method for deciphering speech comprising:
-
training a memory device by uttering each of several phonemes repeatedly, detecting the frequency distribution of each utterance of each phoneme for a plurality of time intervals, deriving at least one energy spectrum representing the energy present in the spoken phoneme as a function of frequency for a plurality of frequency distributions covering a sequency of time intervals, deriving at least a pair of sweep spectra representing the change in energy present in the phoneme from one frequency to another as a function of time for a plurality of frequency distributions, generating profiles representative of the energy and sweep spectra for each phoneme, and storing the profile in the memory device; computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair by generating a difference profile for substantially each pair of phonemes by subtracting the profile of each phoneme from the profile of each other phoneme, identifying adjacent sections of each difference profile which exceed positive or negative thresholds respectively, and generating a probability distribution based on the relative areas in the identified sections of the difference profiles; and identifying the most likely phoneme in an unknown substance by comparing the relative areas in the identified sections of the profile of the unknown phoneme with the probability distributions of each phoneme pair.
-
-
46. Apparatus for processing a spoken utterance consisting of a sequence of phonemes, said apparatus comprising:
-
a microphone for converting the sonic energy of the utterance into an electric signal; a plurality of filters which separate the energy in the electric signal into a plurality of frequency bandwidths; a plurality of energy summing applifiers which sum the energy of adjacent bandwidths for a plurality of electric signals covering a sequency of time intervals to generate an energy spectrum representing the energy present in the phoneme; a plurality of rise and fall edge detectors respectively which detect a movement of energy from the output of one filter to the output of an adjacent filter; and down and up sweep summing amplifiers respectively which generate downsweep and upsweep spectra representing the change in energy present in the phoneme from one frequency to another as a function of time for a sequence of time intervals. - View Dependent Claims (47, 48, 49)
-
-
50. Apparatus for processing a spoken utterance consisting of a sequence of phonemes, said apparatus comprising:
-
a microphone for converting the sonic energy of the utterance into an electric signal; means for detecting the frequency distribution of the electric signal for a plurality of time intervals within the duration of a single phoneme; means for separating the frequency distribution into a plurality of frequency bandwidths; means for deriving at least one energy spectrum representing the energy present in the electric signal as a function of frequency for a plurality of frequency distributions covering a sequency of time intervals; and means for deriving at least a pair of sweep spectra representing a change in energy in the electric signal from one frequency to another as a function of time for a plurality of frequency distributions. - View Dependent Claims (51, 52, 53, 54, 55)
-
Specification