Speech recognition method and apparatus

US 4,489,434 A
Filed: 10/05/1981
Issued: 12/18/1984
Est. Priority Date: 10/05/1981
Status: Expired due to Term

First Claim

Patent Images

1. In a speech analysis system for recognizing at least one predetermined keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, and each target pattern representing at least one short-term power spectrum, and each target pattern having a minimum dwell time duration and a maximum dwell time duration, the method comprising the steps of:

forming at a repetitive frame rate, a sequence of frame patterns from and representing said audio signal, each frame pattern being associated with a frame time, said frame rate corresponding to a frame interval less than one-half the minimum dwell time duration,generating, for each frame pattern, a numerical measure of the similarity of said each frame pattern with ones of said target patterns,accumulating, for each frame time and each keyword, and using said numerical measures and said minimum and maximum dwell times, a numerical word score representing the likelihood that a said keyword ended at a said frame time,said accumulating step including the step of accumulating, for each keyword, the numerical measures for each of a continuous sequence of said repetitively formed frame patterns, starting with the numerical measure of the similarity of a present frame pattern and a last target pattern of said keyword, andgenerating at least a preliminary keyword recognition decision whenever the numerical word score for a keyword exceeds a predetermined recognition level.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method and apparatus for detecting and recognizing one or more keywords in a continuous audio signal are disclosed. Each keyword is represented by a keyword template which corresponds to a sequence of plural target patterns, and each target pattern comprises statistics representing each of a plurality of spectra selected from plural short-term spectra generated according to a predetermined system for processing the incoming audio. The target patterns also have associated therewith minimum and maximum dwell times. The dwell time is the time interval during which a given target pattern can be said to match incoming frame patterns. The spectra are processed to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into multi-frame spectral patterns and each multi-frame spectral pattern is compared by means of likelihood statistics with the target patterns of keyword templates. Each formed multi-frame pattern is then forced to contribute to the total word score for each keyword as represented by the keyword template. Thus the keyword recognition method requires all input patterns to contribute to the word score of a keyword candidate, using the minimum and maximum dwell times for testing whether a target pattern can still match an input pattern, and wherein the frame rate of the audio spectra must be less than one-half the minimum dwell time of a target pattern. A concatentation technique employing a loosely set detection threshold makes it very unlikely that a correct pattern will be rejected. A method for forming the target patterns is also described.

64 Citations

View as Search Results

16 Claims

1. In a speech analysis system for recognizing at least one predetermined keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, and each target pattern representing at least one short-term power spectrum, and each target pattern having a minimum dwell time duration and a maximum dwell time duration, the method comprising the steps of:
- forming at a repetitive frame rate, a sequence of frame patterns from and representing said audio signal, each frame pattern being associated with a frame time, said frame rate corresponding to a frame interval less than one-half the minimum dwell time duration,generating, for each frame pattern, a numerical measure of the similarity of said each frame pattern with ones of said target patterns,accumulating, for each frame time and each keyword, and using said numerical measures and said minimum and maximum dwell times, a numerical word score representing the likelihood that a said keyword ended at a said frame time,said accumulating step including the step of accumulating, for each keyword, the numerical measures for each of a continuous sequence of said repetitively formed frame patterns, starting with the numerical measure of the similarity of a present frame pattern and a last target pattern of said keyword, andgenerating at least a preliminary keyword recognition decision whenever the numerical word score for a keyword exceeds a predetermined recognition level.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 wherein said accumulating step further comprises the steps ofadding to the accumulated word score at each frame pattern of the sequence, wherein the dwell time for the then present target pattern is not greater than the minimum dwell time, a numerical quantity representing the better of the numerical measure representing the similarity of said frame pattern and the then present target pattern and the numerical measure representing the similarity of said frame pattern and a next previous target pattern,adding to the accumulated word score at each frame pattern occurring at a frame time which exceeds the minimum dwell time of the then present target pattern, the better of the numerical measure representing the similarity of said frame pattern and the then present target pattern and the numerical measure representing the similarity of said frame pattern and the next previous target pattern,updating the then present target pattern when both the minimum dwell time is exceeded and the numerical measure for the next previous target pattern is better than the numerical measure for the then present target pattern, by designating the next previous target pattern as the new then present target pattern, anddesignating the next previous target pattern as the new then present target pattern whenever said maximum dwell time for the then present target pattern is exceeded.
  - 3. The method of claim 2 further comprising the steps ofmaintaining a frame count of the number of pattern frames employed in determining said numerical word score for a keyword andgenerating a normalized word score by dividing the accumulated numerical word score for a keyword by the number of pattern frames employed in generating said score.
  - 4. The method of claim 3 wherein said second adding step further comprises the step ofadding a penalty value to the accumulated score for a keyword whenever the maximum dwell time of a target pattern component of the keyword is exceeded.

5. In a speech analysis system for recognizing at least one predetermined keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, and each target pattern representing at least one short-term power spectrum, and each target pattern having a minimum dwell time duration and a maximum dwell time duration, the improvement comprisingmeans for forming at a repetitive frame rate, a sequence of frame patterns from and representing said audio signal, each frame pattern being associated with a frame time, said frame rate corresponding to a frame interval wherein each target pattern has associated therewith at least two frame patterns,means for generating, for each frame pattern, a numerical measure of the similarity of said each frame pattern with selected ones of said target patterns,means for accumulating, for each frame time and each keyword, and using said numerical measures, a numerical word score representing the likelihood that a said keyword ended at a said frame time,said accumulating means including means for accumulating, for each keyword, the numerical measure for each of a continuous sequence of said repetitively formed frame patterns, starting with the numerical measure of the similarity of a present frame pattern and a last target pattern of said keyword, andmeans for generating at least a preliminary keyword recognition decision when the numerical value for a keyword exceeds a predetermined recognition level.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus of claim 5 wherein said accumulating means further comprisesfirst means for adding to the accumulated word score at each frame pattern of the sequence, wherein the minimum dwell time for the then present target pattern is not exceeded, a numerical quantity representing the better of the numerical measure representing the similarity of said frame pattern and the then present target pattern and the numerical measure representing the similarity of said frame pattern and a next previous target pattern,second means for adding to the accumulated word score at each frame pattern occurring at a frame time which exceeds the minimum dwell time of the then present target pattern, the better of the numerical measure representing the similarity of said frame pattern and the then present target pattern and the numerical measure representing the similarity of said frame pattern and the next previous target pattern,means for updating the then present target pattern when both the minimum dwell time is exceeded and the numerical measure for the next previous target pattern is better than the numerical measure for the then present target pattern, by designating the next previous target pattern as the new then present target pattern, andmeans for selecting the next previous target pattern as the new then present target pattern whenever said maximum dwell time for the then present target pattern is exceeded.
  - 7. The apparatus of claim 6 further comprisingcounter means for maintaining a frame count of the number of pattern frames employed in determining said numerical word score for a keyword andmeans for generating a normalized word score by dividing the accumulated numerical word score for a keyword by the number of pattern frames employed in generating said score.
  - 8. The apparatus of claim 7 wherein said second adding means further comprisesmeans for adding a penalty value to the accumulated score for a keyword whenever the maximum dwell time of a target pattern component of the keyword is exceeded.

9. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, each target pattern representing at least one short-term power spectrum, and each target pattern having associated therewith a plurality of sequential dwell time positions, including at least one required dwell time position and at least one optional dwell time position, the number of said required and optional dwell time positions being a measure of the minimum and maximum time duration of a target pattern, the recognition method comprising the steps of:
- forming at a repetitive frame time, a sequence of frame patterns from and representing said audio signal,generating a numerical measure of the similarity of each said frame pattern with each of said target patterns,accumulating for any target pattern second and later required dwell time position, and for each target pattern optional dwell time position, the sum of the accumulated score for the previous target pattern dwell time position during the previous frame time and the numerical measure associated with the target pattern during the present frame time,accumulating, for each keyword first target pattern, first required dwell time position, the sum of the score of the first dwell time position during the previous frame time, and the present numerical measure associated with the keyword first target pattern,accumulating, for each other target pattern first required dwell time position, the sum of the best ending accumulated score for the previous target pattern of the same keyword and the present numerical measure associated with the target pattern, andgenerating a recognition decision, based upon accumulating values of the possible word endings of the last target pattern of each keyword.
- View Dependent Claims (10, 11)
- - 10. The method of claim 9 further comprising the step ofstoring, in association with each dwell time position accumulated score, a word duration count corresponding to the time position length of the keyword associated with the accumulated score at the dwell time position.
  - 11. The method of claim 10 further comprising the step ofstoring, in association with each dwell time position accumulated score, a target pattern duration count corresponding to the position sequence of the dwell time position in the target pattern.

12. An apparatus for recognizing at least one keyword in an audio speech signal, each keyword being characterized by a template having at least one target pattern, each pattern representing at least one short term power spectrum, and each target pattern having a plurality of sequential dwell time positions including at least one required dwell time position and at least one optional dwell time position, the number of said required and optional dwell time positions being a measure of the minimum and maximum time duration of a target pattern, the recognition apparatus comprising,means for forming, at a repetitive frame time rate, a sequence of frame patterns from, and representing, said audio signal,means for generating a numerical measure of the similarity of each said frame pattern with each of said target patterns,first means for accumulating for any target pattern second and later required dwell time position and each target pattern optional dwell time position, the sum of the accumulated score for the previous target pattern dwell time position during the previous frame time and the numerical measure associated with the target pattern during the present frame time,second means for accumulating, for each keyword first target pattern, first required dwell time position, the sum of the score of the first time position during the previous frame time and the numerical measure associated with the keyword first target pattern during the present frame time,third means for accumulating, for each other first target pattern, first required dwell time position, the sum of the best ending accumulated score for the previous target pattern of the same keyword and the numerical measure associated with the target pattern during the present frame time,means for generating a recognition decision, based upon the accumulated numerical values, when a predetermined sequence occurs in said audio signal.
- View Dependent Claims (13, 14)
- - 13. The apparatus of claim 12 further comprisingmeans for storing in association with each dwell time position accumulated score, a word duration count corresponding to the time position length of the keyword associated with the accumulated score at the dwell time position.
  - 14. The apparatus of claim 13 further comprisingsecond means for storing, in association with each dwell time position accumulated score, a target pattern duration count corresponding to the time of the dwell time position in the target pattern.

15. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, each target pattern representing at least one short-term power spectrum, and each target pattern having associated therewith at least one required dwell time position and at least one optional dwell time position, the number of said required and optional dwell time positions being the measure of a minimum and maximum time duration of a target pattern, a method for forming reference patterns representing said keywords comprising the steps of:
- dividing an incoming audio signal corresponding to a keyword into a plurality of subintervals,matching each subinterval to a unique reference pattern,making a second pass through said audio input signals representing said keyword for providing machine generated subintervals for said keywords,determining the interval durations for each subinterval,repeating said steps upon a plurality of audio input signals representing the same keyword,generating statistics describing the reference pattern durations associated with each subinterval, anddetermining the minimum and maximum dwell times for each reference pattern from said assembled statistics.
- View Dependent Claims (16)
- - 16. The method of claim 15 wherein said subintervals are initially spaced uniformly from the beginning to the end of an audio input keyword.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbex Voice Systems, Inc. (Voxware, Inc.)
Original Assignee
Exxon Mobil Corporation
Inventors
Moshier, Stephen L.
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/308,891
Time in Patent Office

1,170 Days
Field of Search

179/1.5 D, 179/1.5 B, 179/1.5 C, 340/146.3 R, 340/146.3 AQ, 340/146.3 WD, 364/513
US Class Current

704/239
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Speech recognition method and apparatus

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

64 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method and apparatus

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links