Method and apparatus for word speech recognition by pattern matching

US 5,732,394 A
Filed: 04/10/1996
Issued: 03/24/1998
Est. Priority Date: 06/19/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A word speech recognition method which performs pattern matching between an unknown speech pattern and multiple reference templates and detects that one of said multiple reference templates which corresponds to the smallest one of distance measures between said unknown speech pattern and said multiple reference templates, said method comprising the steps of:

(a) analyzing an unknown input digital speech signal for each frame and extracting therefrom a sequence of spectral parameters;

(b) detecting start and end points of the speech period of said input digital speech signal and obtaining said sequence of spectral parameters of said input digital speech signal for said speech period as said unknown speech pattern;

(c) selecting one of said multiple reference templates;

(d) calculating a difference d between the period length of said unknown speech pattern and the period length of said selected reference template;

(e) comparing said difference d with a predetermined threshold length ε

₁, where said ε

₁ is a positive value;

(e-1) when said difference d exceeds the threshold length ε

₁, extracting from said unknown speech pattern its partial patterns of about the same length as the period length of said selected reference template, each starting at a different position in said unknown speech pattern; and

(e-2) performing pattern matching between said partial patterns and said selected reference template to detect the distances between them;

(f) determining the smallest one of said detected distances to be the distance between said unknown speech pattern and said selected reference template; and

(g) repeating said steps (c) to (f) for each of said multiple reference templates and outputting, as the result of recognition of said input digital speech signal, the label name of said reference template which provides the smallest one Of the distances between said unknown speech pattern and all of said reference templates.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a word speech recognition method which performs pattern matching between unknown speech pattern and multiple reference templates and detects that one of the reference templates which provides the smallest one of distance measures detected between the unknown speech pattern and the reference templates, when the difference d between the speech period length of the unknown speech pattern and the speech period length of a selected reference template exceeds a fixed threshold value ε₁, partial patterns are extracted from the unknown speech pattern, each starting at a different position, and the minimum one of the distances obtained by pattern matching between these extracted partial patterns and the selected reference template is determined to be the distance between the selected reference template and the unknown speech pattern. When the difference d is in the range of -ε₂ ≦d≦ε₁, pattern matching is performed between speech periods of the unknown speech pattern and the reference templates with their variation periods eliminated therefrom at their both ends.

Citations

16 Claims

1. A word speech recognition method which performs pattern matching between an unknown speech pattern and multiple reference templates and detects that one of said multiple reference templates which corresponds to the smallest one of distance measures between said unknown speech pattern and said multiple reference templates, said method comprising the steps of:
- (a) analyzing an unknown input digital speech signal for each frame and extracting therefrom a sequence of spectral parameters;
  
  (b) detecting start and end points of the speech period of said input digital speech signal and obtaining said sequence of spectral parameters of said input digital speech signal for said speech period as said unknown speech pattern;
  
  (c) selecting one of said multiple reference templates;
  
  (d) calculating a difference d between the period length of said unknown speech pattern and the period length of said selected reference template;
  
  (e) comparing said difference d with a predetermined threshold length ε
  
  ₁, where said ε
  
  ₁ is a positive value;
  
  (e-1) when said difference d exceeds the threshold length ε
  
  ₁, extracting from said unknown speech pattern its partial patterns of about the same length as the period length of said selected reference template, each starting at a different position in said unknown speech pattern; and
  
  (e-2) performing pattern matching between said partial patterns and said selected reference template to detect the distances between them;
  
  (f) determining the smallest one of said detected distances to be the distance between said unknown speech pattern and said selected reference template; and
  
  (g) repeating said steps (c) to (f) for each of said multiple reference templates and outputting, as the result of recognition of said input digital speech signal, the label name of said reference template which provides the smallest one Of the distances between said unknown speech pattern and all of said reference templates.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The word speech recognition method of claim 1, wherein said step (e-1) includes a step of extracting said partial patterns of about the same length as that of said selected reference template from said unknown speech pattern at said speech start and end points thereof, respectively.
  - 3. The word speech recognition method of claim 2, wherein said step (e-1) includes a step of extracting another partial pattern of about the same length as that of said selected reference template from said unknown speech pattern at substantially the middle thereof.
  - 4. The word speech recognition method of claim 2, wherein said step (e-2) is a step of performing end edge free pattern matching for said partial pattern extracted from said unknown speech pattern at said start point thereof and start edge free pattern matching for said partial pattern extracted from said unknown speech pattern at the end point thereof.
  - 5. The word speech recognition method of claim 3, wherein said step (e-2) is a step of performing edge free pattern matching for said partial pattern extracted from said unknown speech pattern at substantially the middle thereof.
  - 6. The word speech recognition method of claim 1, 2, or 3, which further comprises a step of measuring the noise power of a non-speech period of said digital speech signal prior to the inputting thereof, and wherein said step (a) includes a step of obtaining the power of said input digital speech signal for each frame by said speech analysis and said step (b) includes a step of detecting, as said start point of said speech period of said input digital speech signal, a first rise position of the power of said input digital speech signal where it exceeds a threshold level which is the sum of said noise power and a predetermined value and, as said end point of said speech period of the input digital speech signal, any one of fall positions of the power of said input digital speech signal where it decreases below said threshold level.
  - 7. The word speech recognition method of claim 1, 2, or 3, wherein said step (e) further comprises the steps of:
    - (e-3) setting the distance between said unknown speech pattern and said selected reference template to a maximum value when said difference d is smaller than a predetermined second threshold length -ε
      
      ₂, where ε
      
      ₂ is a positive value; and
      
      (e-4) performing pattern matching between said unknown speech pattern and said selected reference template over their entire lengths when said difference d is in the range between said threshold lengths -ε
      
      ₂ and ε
      
      ₁.
  - 8. The word speech recognition method of claim 1, which further comprises a step of measuring the noise power of a non-speech period of said digital speech signal prior to the inputting thereof, and wherein:
    - said step (a) includes a step of obtaining the power of said input digital speech signal for each frame by said speech analysis;
      
      said step (b) includes a step of detecting, as said start point of said speech period of said input digital speech signal, a first rise position of the power of said input digital speech signal where it exceeds a threshold level which is the sum of said noise power and a predetermined value and, as said end point of said speech period of said input digital speech signal, any one of fall positions of the power of said input digital speech signal where it decreases below said threshold level; and
      
      said step (e) further comprises a step (e-3) of extracting said partial patterns of about the same length as that of said selected reference template from said unknown speech pattern within the range from each rise position of said unknown speech pattern to said end point of its speech period when said difference d is larger than said threshold length ε
      
      ₁.
  - 9. The work speech recognition method of claim 1, 2, or 3, wherein said step (e) further comprises the steps of:
    - (e-3) setting the distance between said unknown speech pattern and said selected reference template to a maximum value when said difference d is smaller than a predetermined second threshold length -ε
      
      ₂, where ε
      
      ₂ is a positive value; and
      
      (e-4) when said difference d is in the range between said threshold lengths -ε
      
      ₂ and ε
      
      ₁ ;
      
      (e-4-1) performing pattern matching between said unknown speech pattern and said selected reference template over their entire periods thereof to obtain a first distance between them;
      
      (e-4-2) extracting a reference template partial period from said selected reference template, except its start and end segments;
      
      (e-4-3) extracting a speech pattern partial period from said unknown speech pattern, except its start and end segments;
      
      (e-4-4) performing pattern matching between said reference template partial period and said speech pattern partial period to obtain a second distance between said unknown speech pattern and said selected reference template; and
      
      (e-4-5) comparing said first and second distances and deciding the smaller one of them to be the distance between said unknown speech pattern and said selected reference template.
  - 10. The word speech recognition method of claim 9, wherein the lengths of said start and end segments of said selected reference template and said unknown speech pattern in said steps (e-4-2) and (e-4-3) are predetermined lengths.
  - 11. The word speech recognition method of claim 9, wherein the lengths of said start and end segments of said selected reference template are predetermined lengths, said step (e-4-2) includes a step of detecting first and second spectral parameters at the start and end points of said reference template partial period, respectively, and said step (e-4-3) includes a step of detecting third and fourth spectral parameters closest to said first and second spectral parameters in periods of predetermined lengths from the start and end points of said unknown speech pattern and a step of extracting, as said speech pattern partial period, that period of said unknown speech period which is defined by said third and fourth spectral parameters.

12. A word speech recognizer which performs pattern matching between an unknown speech pattern and multiple reference templates and detects that one of said multiple reference templates which corresponds to the smallest one of distance measures between said unknown speech pattern and said multiple reference templates, said recognizer comprising:
- input means for inputting a digital speech signal;
  
  speech spectral parameter extracting means for analyzing said digital speech signal for each frame and for extracting therefrom a sequence of speech spectral parameters;
  
  speech endpoint detecting means for detecting speech endpoints of the speech period of said digital speech signal on the basis of said sequence of speech spectral parameters outputted from said speech spectral parameter extracting means;
  
  unknown speech pattern register means for determining start and end points of the speech period of said unknown speech pattern on the basis of said detected speech endpoints and for storing a sequence of spectral parameters of said speech period as said unknown speech pattern;
  
  reference template storage means for prestoring multiple reference templates for speech recognition;
  
  period length comparing means for comparing the speech period length of each of said stored multiple reference templates and the speech period length of said unknown speech pattern stored in said unknown speech pattern register means;
  
  input pattern extracting means for extracting partial patterns from said unknown speech pattern stored in said unknown speech pattern register means, each starting at a different position, on the basis of the comparison result from said period length comparing means and the output result from said unknown speech pattern register means;
  
  pattern matching means for performing pattern matching between each of said multiple partial patterns and said each reference template and for outputting multiple distance measures calculated between them;
  
  distance comparing means for comparing said multiple distance measures from said pattern matching and for outputting the smallest distance measure as the distance measure between said unknown speech pattern and said each reference template; and
  
  result output means for outputting the label name of said reference template which provides the distance measure decided to be the smallest among those between all of said multiple reference templates and said unknown speech pattern.

13. A word speech recognition method which performs pattern matching between an unknown speech pattern and multiple reference templates and detects that one of said multiple reference templates which corresponds to the smallest one of distance measures between said unknown speech pattern and said multiple reference templates, said method comprising the steps of:
- (a) analyzing an unknown input digital speech signal for each frame and for extracting therefrom a sequence of spectral parameters;
  
  (b) detecting start and end points of the speech period of said input digital speech signal and obtaining said sequence of spectral parameters of said input digital speech signal for said speech period as said unknown speech pattern;
  
  (c) selecting one of said multiple reference templates;
  
  (d) performing pattern matching between said unknown speech pattern and said selected reference template over their entire lengths to obtain a first distance between them;
  
  (e) extracting a reference template partial period from said selected template, except its start and end segments;
  
  (f) extracting a speech pattern partial period from said unknown speech pattern, except its start and end segments;
  
  (g) performing pattern matching between said reference template partial period and said speech pattern partial period to obtain a second distance between said unknown speech pattern and said selected reference template;
  
  (h) comparing said first and second distances and deciding the smaller one of them to be the distance between said unknown speech pattern and said selected reference template; and
  
  (i) repeating said steps (c) to (h) for each of said multiple reference templates and outputting, as the result of recognition of said input digital speech signal, the label name of said reference template which provides the smallest one of the distances between said unknown speech pattern and all of said multiple reference templates.
- View Dependent Claims (14, 15)
- - 14. The word speech recognition method of claim 13, wherein the lengths of said start and end segments of said selected reference template and said unknown speech pattern in said steps (e) and (f) are predetermined lengths.
  - 15. The word speech recognition method of claim 13, wherein the lengths of said start and end segments of said selected reference template are predetermined lengths, said step (e) includes a step of detecting first and second spectral parameters at the start and end points of said reference template partial period, respectively, and said step (f) includes a step of detecting third and fourth spectral parameters closest to the first and second spectral parameters in periods of predetermined lengths from the start and end points of said unknown speech pattern and a step of extracting, as said speech pattern partial period, that period of the unknown speech period which is defined by said third and fourth spectral parameters.

16. A word speech recognizer which performs pattern matching between an unknown speech pattern and multiple reference templates and detects that one of said multiple reference templates which corresponds to the smallest one of distance measures between said unknown speech pattern and said multiple reference templates, said recognizer comprising:
- input means for inputting a digital speech signal;
  
  speech spectral parameter extracting means for analyzing said digital speech signal for each frame and for extracting therefrom sequence of speech spectral parameters;
  
  speech period detecting means for detecting the speech period of said unknown speech pattern as a first speech period on the basis of said sequence of speech spectral parameters outputted from said speech spectral parameter extracting means and for determining both ends of said first speech period as first speech endpoints;
  
  unknown speech pattern register means for storing a sequence of spectral parameters of said first speech period as said unknown speech pattern;
  
  unknown pattern partial period determining means for determining second speech endpoints that define a second speech period, by eliminating start and end segments from said first speech period detected by said speech period detecting means;
  
  reference template storage means for prestoring multiple reference templates for speech recognition, together with information about first speech endpoints defining their speech periods as first speech periods;
  
  reference template partial period determining means for determining second endpoints that define a second speech period, by eliminating start and end segments from said first speech period of each of said multiple reference templates selected from said reference template storage means;
  
  switching means for selecting said first and second endpoints of said unknown speech pattern and said each selected reference pattern from said speech period detecting means and said reference template pattern storage means, thereby selecting said first and second speech periods of said unknown speech pattern from said unknown pattern register means and said each selected reference template from said reference template storage means;
  
  pattern matching means for performing pattern matching between said first speech periods of said unknown speech pattern and said each selected reference template selected by said switching means to obtain a first distance and for performing pattern matching between said second speech periods of said unknown speech pattern and said each selected reference template selected by said switching means to obtain a second distance;
  
  distance comparing means for comparing said first and second distances to determine the smaller one of them to be the distance measure between said unknown speech pattern and said each selected reference template; and
  
  result output means for comparing all the distance measures outputted from said distance comparing means as the results of matching of said unknown speech pattern with said multiple reference templates, for determining that one of said multiple reference templates which is decided to provide the smallest distance measure, and for outputting the label name of said determined reference template.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nippon Telegraph and Telephone Corporation
Original Assignee
Nippon Telegraph and Telephone Corporation
Inventors
Sakurai, Tetsuma, Nakadai, Yoshio, Nishino, Yutaka
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/630,668
Time in Patent Office

713 Days
Field of Search

704/233, 704/238, 704/239, 704/241, 704/243, 704/246, 704/248, 704/251, 704/252, 704/253, 704/255
US Class Current

704/255
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

G10L 15/10 using distance or distortio...

Method and apparatus for word speech recognition by pattern matching

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for word speech recognition by pattern matching

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links