Speech recognition system

US 4,829,572 A
Filed: 11/05/1987
Issued: 05/09/1989
Est. Priority Date: 11/05/1987
Status: Expired due to Fees

First Claim

Patent Images

1. A method for recognizing speech containing a plurality of significant phonemes, said method comprising the steps of:

constructing a profile of a characteristic of each significant phoneme;

generating a difference profile for substantially each pair of significant phonemes by subtracting the profile of each significant phoneme from the profile of each other significant phoneme;

identifying adjacent sections of each difference profile which exceed positive and negative thresholds respectively;

computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profile;

constructing an equivalent profile of a phoneme of an unknown utterance; and

choosing the more likely phoneme of each phoneme pair based onthe relative areas in the identified sections of the profile of the unknown phoneme.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a system for recognizing speech in which a profile is constructed of a related characteristic of each significant phoneme in a language. A difference profile is generated for each pair of significant phonemes by subtracting the profile of each phoneme from the profile of each other phoneme. Adjacent sections of each difference profile which exceed positive and negative thresholds are identified. The likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profile is computed. An equivalent profile is constructed of a phoneme of an unknown utterance. The most likely phoneme of each phoneme pair is chosen based on the relative areas in the identified sections of the profile of the unknown phonemes.

Citations

55 Claims

1. A method for recognizing speech containing a plurality of significant phonemes, said method comprising the steps of:
- constructing a profile of a characteristic of each significant phoneme;
  
  generating a difference profile for substantially each pair of significant phonemes by subtracting the profile of each significant phoneme from the profile of each other significant phoneme;
  
  identifying adjacent sections of each difference profile which exceed positive and negative thresholds respectively;
  
  computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profile;
  
  constructing an equivalent profile of a phoneme of an unknown utterance; and
  
  choosing the more likely phoneme of each phoneme pair based onthe relative areas in the identified sections of the profile of the unknown phoneme.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein the profile constructing step comprises detecting the frequency distribution of the energy of a known utterance for a plurality of time intervals within the duration of a single phoneme, and deriving at least one energy spectrum representing the peak energy present in the phoneme as a function of frequency for a plurality of frequency distributions covering a sequence of time intervals, and wherein the difference profile generating step includes generating a difference profile for substantially each pair of significant phonemes by subtracting the energy spectrum of each significant phoneme from the energy spectrum of each other significant phoneme.
  - 3. The method of claim 1 wherein the profile constructing step includes the steps of detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of a single phoneme, and deriving at least a pair of sweep spectra representing the change in energy present in the phoneme from one frequency to another as a function of time for a plurality of frequency distributions, and wherein the difference profile generating step includes generating a pair of difference profiles for substantially each pair of significant phonemes by subtracting the respective sweep spectra of each significant phoneme from the respective sweep spectrum of each other significant phoneme.
  - 4. The method of claim 3 wherein the sweep spectra deriving step includes deriving an upward sweep spectra representing the energy change from lower to higher frequencies as a function of time, and deriving a downward sweep spectrum representing the energy change from higher to lower frequencies as a function of time.
  - 5. The method of claim 3 wherein the sweep spectra deriving step includes deriving a short time constant sweep spectrum having a first time constant, and deriving a long time constant sweep spectrum having a time constant longer than the first time constant.
  - 6. The method of claim 1 wherein the profile constructing step comprises constructing a plurality of profiles based on different characteristics of each significant phoneme, and wherein the difference profile generating step includes generating a difference profile for each characteristic for substantially each pair of significant phonemes.
  - 7. The method of claim 6 wherein the identifying step includes identifying more than one set of adjacent sections of the difference profiles which exceed positive and negative thresholds respectively for each phoneme pair.
  - 8. The method of claim 7 wherein the computing step includes computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative area in the identified sets of sections of difference profile, and normalizing the likelihood computation for each set of sections, and wherein the choosing step includes choosing the more likely phoneme of each phoneme pair based on the cumulative relative area in the identified sets of sections of the profile of the unknown phoneme.

9. A method for recognizing speech containing a plurality of significant phonemes, said method comprising the steps of:
- uttering each of a plurality of significant phonemes;
  
  detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of a single phoneme for each significant phoneme;
  
  deriving a plurality of energy spectra from the detected frequency distributions representing the energy present in each phoneme as a function of frequency;
  
  deriving at least a pair of sweep spectra from the detected frequency distributions for each phoneme representing the change of energy present in the phoneme from one frequency to another;
  
  generating a set of different profiles for each pair of significant phonemes by subtracting the corresponding profile of each significant phoneme from the corresponding profile of the other significant phoneme;
  
  identifying adjacent sections in the difference profiles of each set which exceed positive and negative thresholds respectively;
  
  computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profiles;
  
  constructing equivalent spectra for a phoneme of an unknown utterance; and
  
  choosing the most likely phoneme of each phoneme pair based on the relative area in the identified sections of the spectra of the unknown phoneme.
- View Dependent Claims (10, 11, 12)
- - 10. The method of claim 9 wherein the energy spectra deriving step includes deriving short and long time constant energy spectra from the detected frequency distributions for each phoneme.
  - 11. The method of claim 9 wherein the sweep spectra deriving step includes deriving a pair of short and long time constant sweep spectra respectively from the detected frequency distributions for each phoneme.
  - 12. The method of claim 9 wherein the sweep spectra deriving step includes deriving upward and downward sweep respectively from the detected frequency distributions for each phoneme.

13. A method for processing a spoken utterance consisting of a sequence of phonemes said method comprising the steps of:
- detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of a single phoneme;
  
  separating each frequency distribution into a plurality of frequency bandwidths;
  
  deriving at least one energy spectrum representing the energy present in the phoneme as a function of frequency for a plurality of frequency distributions coverting a sequency of time intervals; and
  
  deriving at least a pair of sweep spectra representing the change in energy present in the phoneme from one frequency to another as a function of time for a plurality of frequency distributions,whereby the spoken phoneme can be represented by the energy spectrum and the sweep spectra to facilitate speech recognition.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 35, 36, 37)
- - 14. The method of claim 13 and additionally comprising the step of separating each frequency distribution into a plurality of frequency bandwidths prior to said deriving steps.
  - 15. The method of claim 13 wherein the sweep spectra deriving step includes deriving an upward sweep spectrum representing the energy change from lower to higher frequencies as a function of time, and deriving a downward sweep spectrum representing the energy change from higher to lower frequencies as a function of time.
  - 16. The method of claim 13 wherein the sweep spectra deriving step includes deriving a short time constant sweep spectrum having a first time constant, and deriving a long time constant sweep spectrum having a time constant longer than the first time constant.
  - 17. The method of claim 16 wherein the short time constant sweep spectrum deriving step includes deriving short time constant upward and downward sweep spectra representing the energy change from lower to higher and higher to lower frequencies respectively for a second time constant, and the long time constant sweep spectrum deriving step includes deriving long time constant upward and downward sweep spectra representing the energy change from lower to higher and higher to lower frequencies respectively for a time constant larger than the second time constant.
  - 18. The method of claim 13 wherein the energy spectrum deriving step includes deriving a short time constant energy spectrum based on a first time constant, and deriving a long time constant energy spectrum based on a second time constant longer than the first time constant.
  - 19. The method of claim 13 wherein the separating step includes dividing the detected energy into a plurality of frequency ranges, applying an automatic gain control to each frequency range independently, and separating the output of each automatic gain controller to separate pluralities of frequency bandwidths.
  - 20. The method of claim 13 wherein the separating step includes passing the detected energy through a plurality of lowpass filters, and amplifying the output of adjacent lowpass filters with difference amplifiers to separate the detected energy into a plurality of frequency bandwidths.
  - 21. The method of claim 20 wherein the sweep spectra deriving step includes passing the output of the difference amplifiers through respective upward and downward edge detectors which generate step functions of fixed amplitude and duration, and summing the step functions of adjacent sets of upward and downward edge detectors to generate upward and downward sweep spectra.
  - 22. The method of claim 21 wherein the summing step includes summing the step functions of adjacent, overlapping sets of edge detectors.
  - 23. The method of claim 20 wherein the energy spectrum deriving step includes passing the output of adjacent sets of difference amplifiers through a plurality of summing amplifiers, and summing the outputs of the summing amplifiers.
  - 24. The method of claim 23 wherein the separating step includes isolating the energy at wavelengths higher than the highest said frequency bandwidth and separating that energy into a plurality of high frequency bandwidths, and the energy spectrum deriving step includes incorporating the energy from the high frequency bandwidths into the energy spectrum without passing through the summing amplifiers.
  - 25. The method of claim 13 and additionally comprising the step of averaging the energy and sweep spectra of separate utterances of the same phoneme to derive profiles indicative of the phoneme.
  - 26. The method of claim 25 and additionally comprising the steps of generating a difference profile for substantially each pair of phonemes by subtracting each profile of one phoneme from the equivalent profile of each pair phoneme, identifying adjacent sections of each difference profile which exceed positive and negative thresholds respectively, and computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profiles.
  - 27. The method of claim 26 and additionally comprising the steps of constructing equivalent profiles of a phoneme of an unknown utterance, and choosing the more likely phoneme of each phoneme pair based on the relative areas in the identified sections of the profiles of the unknown phoneme.
  - 28. The method of claim 27 wherein the computing step includes computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sets of sections of the difference profiles, normalizing the likelihood computation for each set of sections, and combining the normalized computations for each phoneme, and wherein the choosing step includes choosing the more likely phoneme of each phoneme pair based on the relative area in the identified sets of sections of the profiles of the unknown phoneme.
  - 35. The method of claim 25 and additionally comprising the steps of generating a difference profile for substantially each pair of phonemes by subtracting each profile of one phoneme from the equivalent profile of each other phoneme, identifying adjacent sections of each difference profile which exceed positive and negative thresholds respectively, and computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profiles.
  - 36. The method of claim 35 and additionally comprising the steps of constructing equivalent profiles of a phoneme of an unknown utterance, and choosing the more likely phoneme of each phoneme pair based on the relative areas in the identified sections of the profiles of the unknown phoneme.
  - 37. The method of claim 36 wherein the computing step includes computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sets of sections of the difference profiles, normalizing the likelihood computation for each set of sections, and combining the normalized computations for each phoneme, and wherein the choosing step includes choosing the more likely phoneme of each phoneme pair based on the relative areas in the identified sets of sections of the profiles of the unknown phoneme.

29. A method for processing a spoken utterance consisting of a sequence of phonemes, said method comprising the steps of:
- detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of the phoneme;
  
  dividing the detected energy into a plurality of frequency ranges;
  
  applying an automatic gain control independently to the energy of each frequency range;
  
  separating the output of each automatic gain control into a plurality of frequency bandwidths;
  
  deriving a short time constant energy spectrum representing the energy present in the phoneme as a function of frequency for a plurality of frequency distributions covering a first sequence of time intervals;
  
  deriving a long time constant energy spectrum representing the energy present in the phoneme as a function of frequency for a plurality of frequency distributions covering a sequence of time intervals which is longer than the first time intervals;
  
  deriving short time constant upward and downward sweep spectra representing the change in energy present in the phoneme from a lower to a higher and a higher to a lower frequency respectively as a function of time for a plurality of frequency distributions covering a second time interval; and
  
  deriving long time constant upward and downward sweep spectra representing the change in energy present in the phoneme from a lower to a higher and a higher to a lower frequency respectively as a function of time for a plurality of frequency distributions covering a time interval longer than the second time interval,whereby the spoken phoneme can be represented by the energy spectrum and the sweep spectra to facilitate speech recognition.
- View Dependent Claims (30, 31, 32, 33, 34)
- - 30. The method of claim 29 wherein the separating step includes passing the detected energy through a plurality of low pass filters, and amplifying the output of adjacent low pass filters with difference amplifiers to separate the detected energy into a plurality of frequency bandwidths.
  - 31. The method of claim 29 wherein the sweep spectra deriving steps each include passing the output of the difference amplifiers through respective upward and downward edge detectors which generates step functions of fixed amplitude and duration, and summing the step functions of adjacent sets of long time constant and short time constant upward and downward edge detectors to generate long time constant and short time constant upward and downward sweep spectra.
  - 32. The method of claim 30 wherein the energy spectra deriving step includes passing the output of adjacent sets of difference amplifiers through a plurality of summing amplifiers, and summing the outputs of the summing amplifiers.
  - 33. The method of claim 32 wherein the separating step includes isolating the energy at wave lengths higher than the highest frequency bandwidth of the low pass filters and separatiang that energy into a plurality of high frequency regions, and the energy spectrum deriving step includes incorporating the energy from the high frequency regions into the energy spectra without passing through the summing amplifiers.
  - 34. The method of claim 29 and additionally and additionally comprising the step of averaging the energy and sweep spectra of separate utterances of the some phoneme to derive profiles indicative of the phoneme.

38. A method for processing a spoken utterance consisting of a sequence of phonemes, said method comprising the steps of:
- detecting the frequency distribution of the energy of the utterance for a plurality of time intervals within the duration of a single phoneme;
  
  passing the detected energy through a plurality of lowpass filters;
  
  amplifying the output of adjacent lowpass filters with difference amplifiers to separate the detected energy into a plurality of frequency bandwidths;
  
  passing the output of the difference amplifiers to respective upward and downward edge detectors to generate step functions of fixed amplitude and duration;
  
  summing the step functions of adjacent, overlapping sets of upward and downward edge detectors respectively to generate upward and downward sweep spectra;
  
  passing the output of adjacent sets of difference amplifiers through a plurality of summing amplifiers; and
  
  summing the outputs of the summing amplifiers to generate at least one energy spectrum,whereby the spoken phoneme can be represented by the energy spectrum and the sweep spectra to facilitate speech recognition.
- View Dependent Claims (39, 40, 41, 42, 43, 44)
- - 39. The method of claim 38 wherein the edge detector summing step includes summing separately the step functions of short time constant edge detectors and long time constant edge detectors respectively to generate short time constant and long time constant upward and downward sweep spectra.
  - 40. The method of claim 38 wherein the summing amplifier summing step includes summing the outputs of the summing amplifiers for a first time interval and a second time interval longer than the first time interval to generate a short time constant energy spectrum and a long time constant energy spectrum.
  - 41. The method of claim 38 and additionally comprising the steps of passing the energy and sweep spectra through an analog to digital converter, adding the spectra from a plurality of repetitions of the same phoneme in a digital computer, and storing the added spectra as profiles in the digital computer.
  - 42. The method of claim 41 and additionally comprising the steps of generating a difference profile for substantially each pair of phonemes by subtracting each profile of one phoneme from the equivalent profiles of each other phoneme, identifying adjacent sections of each difference profile which exceed positive and negative thresholds respectively, and computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sections of the difference profiles.
  - 43. The method of claim 42 and additionally comprising the steps of constructing equivalent profiles of a phoneme of an unknown utterance, and choosing the more likely phoneme of each phoneme pair based on the relative areas in the identified sections of the profiles of the unknown phoneme.
  - 44. The method of claim 43 wherein the computing step includes computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair based on the relative areas in the identified sets of sections of the difference profiles, normalizing the likelihood computation for each set of sections, and combining the normalized computations for each phonemes, and wherein the choosing step includes choosing the more likely phoneme of each phoneme pair based on the relative areas in the identified sets of sections of the profiles of the unknown phoneme.

45. A method for deciphering speech comprising:
- training a memory device by uttering each of several phonemes repeatedly, detecting the frequency distribution of each utterance of each phoneme for a plurality of time intervals, deriving at least one energy spectrum representing the energy present in the spoken phoneme as a function of frequency for a plurality of frequency distributions covering a sequency of time intervals, deriving at least a pair of sweep spectra representing the change in energy present in the phoneme from one frequency to another as a function of time for a plurality of frequency distributions, generating profiles representative of the energy and sweep spectra for each phoneme, and storing the profile in the memory device;
  
  computing the likelihood that an unknown phoneme will be one or the other of a phoneme pair by generating a difference profile for substantially each pair of phonemes by subtracting the profile of each phoneme from the profile of each other phoneme, identifying adjacent sections of each difference profile which exceed positive or negative thresholds respectively, and generating a probability distribution based on the relative areas in the identified sections of the difference profiles; and
  
  identifying the most likely phoneme in an unknown substance by comparing the relative areas in the identified sections of the profile of the unknown phoneme with the probability distributions of each phoneme pair.

46. Apparatus for processing a spoken utterance consisting of a sequence of phonemes, said apparatus comprising:
- a microphone for converting the sonic energy of the utterance into an electric signal;
  
  a plurality of filters which separate the energy in the electric signal into a plurality of frequency bandwidths;
  
  a plurality of energy summing applifiers which sum the energy of adjacent bandwidths for a plurality of electric signals covering a sequency of time intervals to generate an energy spectrum representing the energy present in the phoneme;
  
  a plurality of rise and fall edge detectors respectively which detect a movement of energy from the output of one filter to the output of an adjacent filter; and
  
  down and up sweep summing amplifiers respectively which generate downsweep and upsweep spectra representing the change in energy present in the phoneme from one frequency to another as a function of time for a sequence of time intervals.
- View Dependent Claims (47, 48, 49)
- - 47. The apparatus of claim 46 wherein the filters are low pass filters, and additionally comprising a plurality of difference amplifiers which output an electrical signal representing the amount of energy present in each bandwidth to the energy summing amplifiers and the edge detectors.
  - 48. The apparatus of claim 46 and additionally comprising an analog to digital converter for digitizing the energy spectrum and the sweep spectra, and digital processing means for adding a plurality of spectra for a given phoneme and developing profiles representative of that phoneme.
  - 49. The apparatus of claim 46 and additionally comprising means for isolating the energy in the electrical signal above the wavelength of the highest of the filters, and means for adding the high frequency energy to the energy spectra without passing through the energy summing amplifiers.

50. Apparatus for processing a spoken utterance consisting of a sequence of phonemes, said apparatus comprising:
- a microphone for converting the sonic energy of the utterance into an electric signal;
  
  means for detecting the frequency distribution of the electric signal for a plurality of time intervals within the duration of a single phoneme;
  
  means for separating the frequency distribution into a plurality of frequency bandwidths;
  
  means for deriving at least one energy spectrum representing the energy present in the electric signal as a function of frequency for a plurality of frequency distributions covering a sequency of time intervals; and
  
  means for deriving at least a pair of sweep spectra representing a change in energy in the electric signal from one frequency to another as a function of time for a plurality of frequency distributions.
- View Dependent Claims (51, 52, 53, 54, 55)
- - 51. The apparatus of claim 50 wherein the energy spectrum deriving means includes means for deriving a short time constant energy spectrum based on a first time constant, and means for deriving a long time constant energy spectrum based on a second time constant longer than the first time constant.
  - 52. The apparatus of claim 50 wherein the sweep spectra deriving means includes means for deriving an upward sweep spectra representing the energy change from lower to higher frequencies as a function of time, and means for deriving a downward sweep spectra representing the energy change from higher to lower frequencies as a function of time.
  - 53. The apparatus of claim 50 wherein the sweep spectra deriving means includes means for deriving a short time constant sweep spectrum having a first time constant, and means for deriving a long time constant spectrum having a time constant longer than the first time constant.
  - 54. The apparatus of claim 50 wherein the sweep spectra deriving means includes means for deriving short time constant and long time constant upward and downward sweep spectra representing the energy change from lower to higher and higher to lower frequencies for first and relatively long second time constants.
  - 55. The apparatus of claim 50 and additionally comprising means for digitizing the energy and sweep spectra, and means for adding the digitized energy and sweep spectra of separate utterances of the same phoneme to develop profiles indicative of the phoneme.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dr. Andrew Ho Chung Yin
Original Assignee
Andrew H.O. Chung
Inventors
Kong, King-Leung
Primary Examiner(s)
Salce, Patrick R.
Assistant Examiner(s)
VOELTZ, EMANUEL T

Application Number

US07/117,485
Time in Patent Office

551 Days
Field of Search

381/41-50, 364/513.5
US Class Current

704/249
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

55 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

55 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links