Continuous speech recognition method

US 4,227,177 A
Filed: 04/27/1978
Issued: 10/07/1980
Est. Priority Date: 04/27/1978
Status: Expired due to Term

First Claim

Patent Images

1. In a speech analysis system for recognizing at least one predetermined keyword in an audio signal, each said keyword being characterized by a template having at least one target pattern, said target patterns having an ordered sequence and each target pattern representing at least one short term power spectrum, an analysis method comprising the steps ofrepeatedly evaluating electrical signals representing a set of parameters determining a short-term power spectrum of said audio signal within each of a plurality of equal duration sampling intervals, thereby to generate a continuous time ordered sequence of short-term audio power spectrum frames,repeatedly generating electrical signals representing a peak spectrum corresponding to said short-term power spectrum frames by a fast attack, slow decay peak detecting function, andfor each short-term power spectrum frame, dividing the amplitude of each frequency band by the corresponding intensity value in the corresponding peak spectrum, thereby to generate a frequency band equalized spectrum frame corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the frame, andidentifying electrical signals representing a candidate keyword template when said selected multi-frame patterns correspond respectively to the target patterns of a said keyword template.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method for detecting and recognizing one or more keywords in a continuous audio signal is disclosed. Each keyword is represented by a keyword template representing one or more target patterns, and each target pattern comprises statistics of each of at least one spectrum selected from plural short-term spectra generated according to a predetermined system for processing of the incoming audio. The spectra are processed by a frequency equalization and normalizing method to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into spectral patterns, are transformed to reduce dimensionality of the patterns, and are compared by means of likelihood statistics with the target patterns of the keyword templates. A concatenation technique employing a loosely set detection threshold makes it very unlikely that a correct pattern will be rejected.

85 Citations

View as Search Results

14 Claims

1. In a speech analysis system for recognizing at least one predetermined keyword in an audio signal, each said keyword being characterized by a template having at least one target pattern, said target patterns having an ordered sequence and each target pattern representing at least one short term power spectrum, an analysis method comprising the steps ofrepeatedly evaluating electrical signals representing a set of parameters determining a short-term power spectrum of said audio signal within each of a plurality of equal duration sampling intervals, thereby to generate a continuous time ordered sequence of short-term audio power spectrum frames,repeatedly generating electrical signals representing a peak spectrum corresponding to said short-term power spectrum frames by a fast attack, slow decay peak detecting function, andfor each short-term power spectrum frame, dividing the amplitude of each frequency band by the corresponding intensity value in the corresponding peak spectrum, thereby to generate a frequency band equalized spectrum frame corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the frame, andidentifying electrical signals representing a candidate keyword template when said selected multi-frame patterns correspond respectively to the target patterns of a said keyword template.
- View Dependent Claims (2)
- - 2. The method of claim 1 wherein generating said peak spectrum further includes the step ofselecting the value of each of the peak spectrum frequency bands from the maximum of(a) the current peak spectrum value multiplied by a constant decay factor having a value less than one, and(b) the incoming new spectrum frame value.

3. In a speech analysis system in which an audio signal is spectrum analyzed for recognizing at least one predetermined keyword in a continuous audio signal, each said keyword being characterized by a template having at least one target pattern, said target patterns having an ordered sequence and each target pattern representing a plurality of short-term power spectra spaced apart in real time, an analysis method comprising the steps ofrepeatedly evaluating electrical signals representing a set of parameters determining a short-term power spectrum of said audio signal with each of a plurality of equal duration sampling intervals, thereby to generate a continuous time ordered sequence of short-term audio power spectrum frames,repeatedly generating electrical signals representing a peak spectrum corresponding to said short-term power spectrum frames by a fast attack, slow decay peak detecting function,for each short-term power spectrum frame, dividing the amplitude of each frequency band by the corresponding intensity value in the corresponding peak spectrum, thereby to generate a frequency band equalized spectrum frame corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the spectrum,repeatedly selecting from said sequence of equalized frames, one first frame and at least one later occurring frame to form a multi-frame pattern,comparing each thus formed multi-frame pattern with each first target pattern of each keyword template,deciding whether each said multi-frame pattern corresponds to a said first target pattern of a keyword template,for each multi-frame pattern which, according to said deciding step, corresponds to a said first target pattern of a potential candidate keyword, selecting later occurring short-term power spectrum equalized frames to form later occurring multiframe patterns,deciding whether said later occurring multi-frame patterns correspond respectively to successive target patterns of said potential candidate keyword template, andidentifying electrical signals representing a candidate keyword template when said selected multi-frame patterns correspond respectively to the target patterns of a said keyword template.
- View Dependent Claims (4)
- - 4. The method of claim 3 wherein generating said peak spectrum further includes the step ofselecting the value of each of the peak spectrum frequency bands from the maximum of(a) the current peak spectrum value multiplied by a constant decay factor having a value less than one, and(b) the incoming new spectrum frame value.

5. In a pattern recognition system for identifying in a data stream electrical signals representing at least one target pattern characterized by a vector of recognition elements x_i, said elements having a statistical distribution, the analysis method comprising the steps ofdetermining from plural design set pattern samples x of the target pattern a covariance matrix K;
- determining from said plural design set patterns an expected value vector x;
  
  calculating from the covariance matrix K, a plurality of eigenvectors e_i having eigenvalues v_i where v_i ≧
  
  v_i +1;
  
  selecting electrical signals representing unknown patterns y from said data stream;
  transforming the electrical signals representing each pattern y into electrical signals representing a new vector (W₁, W₂, . . . , W_p, R) where
  space="preserve" listing-type="equation">W.sub.i =e.sub.i (y-x),
  p is a positive integer constant less than the number of elements of the pattern y, and R is the reconstruction error statistic and equals ##EQU17## and deciding, by applying a likelihood statistic function to electrical signals representing said new vector (W₁, W₂, . . . , W_p, R), whether said pattern y is identified with the target pattern.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5 further including the step ofcalculating a likelihood statistic l'"'"' according to the equation:
    - ##EQU18## where the barred variables are sample means, and var ( ) is the unbiased sample variance.
  - 7. The method of claim 5 further including the step ofcalculating a likelihood statistic l" according to the equation:
    - ##EQU19## where the barred variables are sample means, and var ( ) is the unbiased sample variance.

8. In a speech analysis system for recognizing at least one predetermined keyword in a continuous audio signal, each said keyword being characterized by a template having at least one target pattern, said target patterns having an ordered sequence and each target pattern representing a plurality of short-term power spectra spaced apart in real time, an analysis method comprising the steps offor each target pattern, determining from electrical signals representing plural design set pattern samples x of a said target pattern having elements x_i, a covariance matrix K;
- determining from said plural design set patterns an expected value vector x;
  
  calculating from the covariance matrix K a plurality of eigenvectors e_i having eigenvalues v_i where v_i ≧
  
  v_i +1;
  
  repeatedly evaluating electrical signals representing a set of parameters determining a short-term power spectrum of said audio signal within each of a plurality of equal duration sampling intervals, thereby to generate a continuous time ordered sequence of short-term audio power spectrum frames;
  
  repeatedly selecting from said sequence of frames, electrical signals representing one first frame and at least one later occurring frame to form a multi-frame pattern y;
  transforming the electrical signals representing each multi-frame pattern y into electrical signals representing new vectors W, represented as (W₁, W₂, . . ., W_p, R), where
  space="preserve" listing-type="equation">W.sub.i =e.sub.i (y-x);
  p is a positive integer constant less than the number of elements of the pattern y, andR is the reconstruction error statistic and equals ##EQU20## deciding whether each said transformed pattern corresponds to a said first target pattern of a keyword template;
  
  for each pattern which, according to said deciding step, corresponds to a said first target pattern of a potential candidate keyword, selecting later occurring short-term power spectra to form later occurring multi-frame patterns;
  
  deciding whether said later occurring multi-frame patterns correspond respectively to successive target patterns of said potential candidate keyword template; and
  
  identifying electrical signals representing a candidate keyword template when said selected multi-frame patterns correspond respectively to the target patterns of a said keyword template.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The method of claim 8 wherein the deciding steps each include the step ofcalculating a likelihood statistic l'"'"' according to the equation:
    - ##EQU21## where the barred variables are sample means, and var ( ) is the unbiased sample variance.
  - 10. The method of claim 8 wherein the deciding steps each include the step ofcalculating a likelihood statistic l" according to the equation:
    - ##EQU22## where the barred variables are sample means, and var ( ) is the unbiased sample variance.
  - 11. The method of claim 8 further including the steps ofrepeatedly generating a peak spectrum corresponding to said short-term power spectrum frames by a fast attack, slow decay peak detecting function, andfor each short-term power spectrum frame dividing the amplitude of each frequency band by the corresponding intensity value in the corresponding peak spectrum, thereby to generate a frequency band equalized spectrum frame corresponding to a compensated audio signal having the same maximum short-term energy content in each of the frequency bands comprising the frame.
  - 12. The method of claim 11 wherein generating said peak spectrum further includes the step ofselecting the value of each of the peak spectrum frequency bands from the maximum of(a) the current peak spectrum value multiplied by a constant decay factor having a value less than one, and(b) the incoming new spectrum value.

13. In a pattern recognition system for identifying in a data stream electrical signals representing at least one target pattern characterized by a vector of recognition elements x_i, said elements having a statistical distribution, the analysis method comprising the steps ofselecting electrical signals representing unknown patterns y from said data stream;
- transforming the electrical signals representing each pattern y into electrical signals representing a new vector (W₁, W₂, . . . , W_p, R) wherethe W_i represent elements of a vector obtained in a principle component analysis,p is a positive integer constant less than the number of elements of the pattern y, and R is a reconstruction error statistic representing information at least in part not contained in said elements W_i ; and
  
  deciding, by applying a likelihood statistic function to electrical signals representing said new vector (W₁, W₂, . . . , W_p, R), whether said pattern y is identified with the target pattern.

14. In a speech analysis system for recognizing at least one predetermined keyword in a continuous audio signal, each said keyword being characterized by a template having at least one target pattern, said target patterns having an ordered sequence and each target pattern representing a plurality of short-term power spectra spaced apart in real time, an analysis method comprising the steps offor each target pattern, determining from electrical signals representing plural design set pattern samples x of a said target pattern and said samples x having elements x_i ;
- repeatedly evaluating electrical signals representing a set of parameters determining a short-term power spectrum of said audio signal within each of a plurality of equal duration sampling intervals, thereby to generate a continuous time ordered sequence of short-term audio power spectrum frames;
  
  repeatedly selecting from said sequence of frames, electrical signals representing one first frame and at least one later occurring frame to form a multi-frame pattern y;
  
  transforming the electrical signals representing each multi-frame pattern y into electrical signals representing new vectors W, represented as (W₁, W₂, . . . , W_p, R), whereW_i represent elements of a vector obtained in a principle component analysis,p is a positive integer constant less than the number of elements of the pattern y, andR is a reconstruction error statistic representing at least in part information not contained in said elements W_i ;
  
  deciding whether each said transformed pattern corresponds to a said first target pattern of a keyword template;
  
  for each pattern which, according to said deciding step, corresponds to a said first target pattern of a potential candidate keyword, selecting later occurring short-term power spectra to form later occurring multi-frame patterns;
  
  deciding whether said later occurring multi-frame patterns correspond respectively to successive target patterns of said potential candidate keyword template; and
  
  identifying electrical signals representing a candidate keyword template when said selected multi-frame patterns correspond respectively to the target patterns of a said keyword template.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbex Voice Systems, Inc. (Voxware, Inc.)
Original Assignee
Dialog Systems Incorporated
Inventors
Moshier, Stephen L.
Primary Examiner(s)
Boudreau, Leo H.

Application Number

US05/901,006
Time in Patent Office

894 Days
Field of Search

340/146.3 R, 340/146.3 AC, 340/146.3 AQ, 340/146.3 WD, 179/1 SA, 179/1 SB, 179/1 SC, 179/1 SD
US Class Current

704/231
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Continuous speech recognition method

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

85 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Continuous speech recognition method

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

85 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links