Apparatus for speech recognition

US 4,736,429 A
Filed: 06/07/1984
Issued: 04/05/1988
Est. Priority Date: 06/07/1983
Status: Expired due to Term

First Claim

Patent Images

1. Apparatus for speech recognition, comprising:

(a) spectrum analyzer means for obtaining parameters indicative of the spectrum of an input speech signal, the spectrum analyzer means performing a linear prediction analysis of the input speech signal for obtaining a set of LPC cepstrum coefficients for the input speech signal(b) a standard pattern storing means for storing phoneme standard patterns of phonemes or phoneme groups;

(c) a similarity calculating means for calculating the degree of similarity between the LPC cepstrum coefficients derived from said spectrum analyzer means and standard patterns stored in said standard pattern storing means, said calculating means determining a measure of the statistical distance between the LPC cepstrum coefficients and the standard patterns;

(d) a segmentation means for segmenting the input speech signal in response to the statistical distance measure derived by said similarity calculating portion and time-dependent power variations in low- and high-frequency ranges of the inout speech signal; and

(e) a phoneme discriminating means of recognizing phonemes in response to a signal derived by said similarity calculating means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Apparatus for speech recognition, having each phoneme as a fundamental recognition unit, recognizes input speech by discriminating phonemes in the input speech. The apparatus comprises a memory for storing phoneme standard patterns of phonemes or phoneme groups; a spectrum analyzer for obtaining parameters indicative of the input speech signal spectrum; a statistical distance measure similarity calculator calculates the degree of similarity between the output of the spectrum analyzer and standard patterns stored in the memory; a segmentation portion for segmenting by using time-dependent low- and high-frequency power variations of the input speech signal and results from the similarity calculator; and a phoneme discriminator for recognizing phonemes by using the results from the similarity calculator.

Citations

26 Claims

1. Apparatus for speech recognition, comprising:
- (a) spectrum analyzer means for obtaining parameters indicative of the spectrum of an input speech signal, the spectrum analyzer means performing a linear prediction analysis of the input speech signal for obtaining a set of LPC cepstrum coefficients for the input speech signal(b) a standard pattern storing means for storing phoneme standard patterns of phonemes or phoneme groups;
  
  (c) a similarity calculating means for calculating the degree of similarity between the LPC cepstrum coefficients derived from said spectrum analyzer means and standard patterns stored in said standard pattern storing means, said calculating means determining a measure of the statistical distance between the LPC cepstrum coefficients and the standard patterns;
  
  (d) a segmentation means for segmenting the input speech signal in response to the statistical distance measure derived by said similarity calculating portion and time-dependent power variations in low- and high-frequency ranges of the inout speech signal; and
  
  (e) a phoneme discriminating means of recognizing phonemes in response to a signal derived by said similarity calculating means.
- View Dependent Claims (2, 17)
- - 2. Apparatus as claimed in claim 1, wherein said statistical distance measure is selected from one of Bayes'"'"' discriminant function, Mahalanobis'"'"' distance function and the linear discriminant functions.
  - 17. Apparatus as claimed in claim 1, wherein the phoneme discriminating means is responsive to a signal derived by said segmentation means.

3. Apparatus for speech recognition, comprising:
- (a) a spectrum analyzer means for deriving spectrum information of an input speech signal, said spectrum information being a set of LPC cepstrum coefficients obtained by way of linear predictive analysis;
  
  (b) a first similarity calculating means for obtaining, by using a statistical distance measure, the degree of similarity of said input speech to phonemes of vowel features, voiced sounds and unvoiced sounds, said calculating means calculating the degree of similarity between the LPC spectrum coefficients derived from said spectrum analyzing means and standard patterns stored in a standard pattern storing means;
  
  (c) a first recognition means for segmenting and recognizing the input speech signal in response to a continuity of the statistical distance derived by said first similarity calculation means;
  
  (d) a segmentation parameter extracting means for deriving power information of low- and high-frequency ranges of said input speech signal;
  
  (e) a consonant segmentation means for segmenting consonant phonemes in response to signals representing the results of time-dependent variations of low- and high-frequency ranges of said power information in the input speech signal;
  
  (f) a second similarity calculation means for calculating, by using a statistical distance measure, the degree of similarity between coefficients derived from said spectrum analyzing means and standard phoneme patterns from said standard pattern storing portion of respective periods determined by said consonant segmentation portion;
  
  (g) a second recognition means for recognizing consonant phonemes in response to the degree of similarity determined by said second similarity means;
  
  (h) a phoneme string producing means for deriving phoneme strings in response to the degree of similarity determined by said first recognition means and the results from said second recognition portion; and
  
  (i) a matching means for comparison/matching the phoneme strings derived from said phoneme string producing means and dictionary items included in a word dictionary so as to derive a dictionary item having the highest similarity to said phoneme strings.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 4. Apparatus as claimed in claim 3, wherein said spectrum information is derived by a linear predictive analayzer means or a bandpass filter bank.
  - 5. Apparatus as claimed in claim 3, wherein said statistical distance measure is selected from one of Bayes'"'"' discriminant function, Mahalanobis'"'"' distance function and the linear discriminant functions.
  - 6. Apparatus as claimed in claim 3, wherein said first similarity calculation means includes means for calculating a vowel similarity calculation and determining a voiced/unvoiced sound feature;
    - said first recognizing means including means for recognizing vowels;
      
      said second similarity calculator including means for calculating the similarity of consonants; and
      
      said second recognition portion comprising means for recognizing consonants.
  - 7. Apparatus as claimed in claim 3, wherein said standard pattern storing means stores predetermined standard patterns representing respective phonemes or phoneme groups in response to speech from plural speakers;
    - said first similarity calculation means deriving a time series of phonemes having the highest similarity with feature parameters derived from said input speech signal being compared with said standard patterns;
      
      said consonant segmenting means detecting vowels at the beginning of words and consonants, a vowel period being detected by the segmenting means as a stable occurrence at the beginning of the input speech signal, a consonant period being detected by the segmenting means as a period while a vowel does not last or as a period while a nasal sound or unvoiced sound occurs.
  - 8. Apparatus as claimed in claim 3, wherein said segmentation parameter extracting means derives power information for the low- and high-frequency ranges of the speech signal and for effecting speech segmentation, said consonant segmentation means detecting a proposed consonant period in response to dips in said power information and detecting a consonant period from said proposed consonant period.
  - 9. Apparatus as claimed in claim 8, wherein said consonant segmentation means detects the proposed consonant period in response to maximal and minimal values in the rate of said time-dependent variation of said power information in the low- and high-frequency ranges, and the time between the occurrence points of said maximal and minimal values, said consonant segmentation means detecting a consonant period from said proposed consonant period in response to the difference between said maximal and minimal values of said power information of said low- and high-frequency ranges.
  - 10. Apparatus as claimed in claim 9, wherein said consonant period is detected from said proposed consonant period by measuring the statistical distance between predetermined parameters of the speech signal and predetermined standard patterns, the predetermined parameters being indicative of the magnitude of dips of the power information of said low- and high-frequency ranges.
  - 11. Apparatus as claimed in claim 9, wherein said consonant segmentation means detects said consonant period from said proposed consonant period in response to the magnitude of dips of the power information of said low- and high-frequency ranges being applied to a discriminant diagram.
  - 12. Apparatus as claimed in claim 3, wherein said consonant segmentation means detects an in-word consonant period in response to one or more of:
    - (a) the magnitude of power dips in time-dependent variations of said power information in said low- and high-frequency ranges of said inout speech signal;
      
      (b) said first recognition means recognizing all frames included in an overall sound period of the speech signal as vowels or nasal sounds, followed by a period in which at least a predetermined plural number of frames recognized as a nasal sound continue; and
      
      (c) said first similarity calculation means performing voiced/unvoiced frame determination for all frames included in an overall sound period of the speech signal while more than a predetermined number of unvoiced sound frames continue.
  - 13. Apparatus as claimed in claim 3, wherein segmentation of a beginning consonant of a word performed by said consonant segmentation portion is in an arbitrary order of the following first to third methods such that when a consonant is detected by one or two of said first to third methods, the remaining method or methods is/are not applied, where:
    - the first method includes capturing time-dependent power variations in the low- and high-frequency ranges at the beginning of a word of said input speech signal;
      
      the second method includes responding to the voiced/unvoiced frame determination detected by said first similarity calculation portion for respective frames of a sound period; and
      
      the third method includes responding to five vowel and nasal sound frames detected by said first recognition means.
  - 14. Apparatus as claimed in claim 3, wherein said first similarity calculation portion determines whether a sound has a voiced or unvoiced sound feature in response to a measure of the statistical distance between LPC cepstrum coefficients and two standard patterns, said LPC cepstrum coefficients being used as parameters indicative of the spectral shape of said input speech signal, said statistical distance measure being a measure of the similarity, with said said two standard patterns indicative of the shape of an average spectrum of voiced sounds and unvoiced sounds being stored in advance in said standard pattern storing means.
  - 15. Apparatus as claimed in claim 14, wherein said statistical distance measure is selected from one of Bayes'"'"' discriminant function, Mahalanobis'"'"' distance function and a linear discriminant function.
  - 16. Apparatus as claimed in claim 3, wherein portions of said first and second similarity calculation means are common to each other.

18. A method of recognizing speech, comprising the steps of analyzing the spectral content of the speech by determining a set of linear predictive cepstrum coefficients obtained by performing a linear predictive analysis on the speech;
- performing a statistical distance measure between the speech with phonemes of vowel features, voiced sounds and unvoiced sounds by calculating the degree of similarity between the LPC cepstrum coefficients and stored standard patterns;
  
  segmenting and recognizing the speech in response to a continuity of the statistical distance determined during the immediately preceding step;
  
  extracting parameter segments of the input speech by deriving power information of low- and high-frequency ranges of the speech;
  
  segmenting consonant phonemes of the speech in response to the statistical distance similarity calculation and time-dependent variations of low- and high-frequency ranges of the power information in the speech;
  
  calculating the degree of similarity between coefficients derived from the spectral analysis and stored standard consonant phoneme patterns;
  
  recognizing consonant phonemes in response to the degree of similarity determined during the immediately preceding step;
  
  deriving phoneme strings in response to the degree of similarity determined by both of the similarity calculations; and
  
  comparing/matching the phoneme strings derived from the phoneme strings and dictionary items to derive a dictionary item having the greatest similarity with the phoneme strings.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
- - 19. The method of claim 18, wherein data are derived from the speech of plural speakers to derive the predetermined standard patterns representing phonemes or phoneme groups,the consonants being segmented by detecting vowels at the beginning of words and consonants, detecting a vowel period as a stable occurrence at the beginning of the input speech signal, detecting a consonant period as a period while a vowel does not last or as a period while a nasal sound or unvoiced sound occurs.
  - 20. The method of claim 18, wherein the extracted parameter is segmented by:
    - responding to power information for the low- and high-frequency ranges of the speech signal and for segmenting the speech;
      
      detecting the consonant segments by detecting a proposed consonant period in response to dips in the power information; and
      
      detecting a consonant period from the proposed consonant period.
  - 21. The method of claim 20, wherein the consonant is segmented by:
    - detecting the proposed consonant period in response to maximal and minimal values in the rate of the time-dependent variation of the power information in the low-and high-frequency ranges and the time between occurrence points of the maximum and minimum values; and
      
      detecting a consonant period from the proposed consonant period by determining the difference between the maximum and minimum values of the power information of the low- and high-frequency ranges.
  - 22. The method of claim 21, wherein the consonant period is detected from the proposed consonant period by measuring the statistical distance between predetermined patterns of the speech and predetermined standard patterns, the predetermined parameters being indicative of the number of dips of the power information of the low- and high-frequency ranges.
  - 23. The method of claim 21, wherein the consonants are segmented by detecting the consonant periods from the proposed consonant periods in response to the magnitude of dips of the power information of the low- and high-frequency ranges, as applied to a discriminant diagram.
  - 24. The method of claim 18, wherein the consonant is segmented by detecting an in-word consonant period in response to one or more of:
    - the magnitude of power dips in time-dependent variations of power information in the low- and high-frequency ranges of the speech;
      
      recognizing all frames included in an overall sound period of the speech signal as vowels or nasal sounds, followed by a period during which at least a predetermined plural number of frames are continuously recognized as nasal sounds by the segmenting and recognizing step responsive to the continuity of the statistical distance between the input speech and phonemes of vowel features, voiced sounds and unvoiced sounds;
      
      detection of voiced/unvoiced frames during an overall sound period of the speech signal while more than a predetermined number of unvoiced frames continue.
  - 25. The method of claim 18, wherein beginning consonants of a word are segmented in an arbitrary order of the following first to third steps such that a consonant is detected by one or two of the first to third steps and the remaining step or steps are not performed, where the steps are:
    - first capturing time-dependent power variations in the low- and high-frequency ranges at the beginning of a word of the speech;
      
      the second step includes responding to the voiced/unvoiced frame determination resulting from the degree of similarity of the speech to phonemes of vowel features, voiced sounds and unvoiced sounds; and
      
      the third step including responding to five vowel and nasal sound frames detected by determining the statistical difference between the speech and phonemes of vowel features, voiced sound and unvoiced sounds.
  - 26. The method of claim 18, wherein the degree of similarity of the speech to phonemes of vowel features, voiced sounds and unvoiced sounds determines whether a sound has voiced or unvoiced sound features in response to a measure of the statistical difference between LPC cepstrum coefficients and two standard patterns, the LPC cepstrum coefficients being used as parameters indicative of the spectral shape of the speech, the statistical distance measure being a measure of the similarity, two standard patterns indicative of the shape of an average spectrum of voiced sounds and unvoiced sounds being stored to enable the statistical distance measurement to be performed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Fujii, Satoru, Morii, Shuji, Inoue, Ikuo, Niyada, Katsuyuki
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US06/618,368
Time in Patent Office

1,398 Days
Field of Search

381/41-43, 364/513.5
US Class Current

704/254
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

Apparatus for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links