Method and apparatus for continuous word string recognition

US 4,489,435 A
Filed: 10/05/1981
Issued: 12/18/1984
Est. Priority Date: 10/05/1981
Status: Expired due to Term

First Claim

Patent Images

1. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, each target pattern representing at least two short-term power spectra, and each target pattern having associated therewith at least two required dwell time positions and at least one optional dwell time position, the recognition method comprising the steps of:

forming at a repetitive frame time, a sequence of input frame patterns from and representing said audio signal, each frame pattern being associated with a frame time, successive frame patterns corresponding to successive dwell time positions,generating a numerical measure of the similarity of each said frame pattern with each of said target patterns,accumulating for each said target pattern required dwell time position and each said target pattern optional dwell time position, and using said numerical measure of the similarity of the just formed frame pattern and said each target pattern, a numerical value representing the alignment of the just formed frame pattern with the respective target pattern dwell time position, andgenerating a recognition decision, based upon said numerical values, when a predetermined sequence occurs in said audio signal.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method and apparatus for recognizing word strings in a continuous audio signal are disclosed. The word strings are made up of a plurality of elements, and each element, generally a word, is represented by an element template defined by a plurality of target patterns. Each target pattern is represented by a plurality of statistics describing the expected behavior of a group of spectra selected from plural short-term spectra generated by processing of the incoming audio. Each target pattern has associated therewith at least one required dwell time position and at least one optional dwell time position. The number of required dwell time positions and the sum of the required and optional dwell time positions define, in effect, the limits of a time interval during which a given target pattern can be said to match an incoming sequence of frame patterns. The incoming audio spectra are processed to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into multi-frame spectral patterns and are compared, using likelihood statistics, with the target patterns of the element templates. Each multi-frame pattern input, which inputs occur at a frame rate which requires each keyword target pattern to correspond to at least two of the multi-frame patterns, is forced to contribute to each of a plurality of pattern scores as represented by the element templates. The contributions of said multi-frame pattern inputs to said pattern scores is controlled, in part, by said required and optional dwell time constraints. A concatenation technique is employed, using dynamic programming techniques, to determine the correct identity of the word string.

83 Citations

View as Search Results

16 Claims

1. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, each target pattern representing at least two short-term power spectra, and each target pattern having associated therewith at least two required dwell time positions and at least one optional dwell time position, the recognition method comprising the steps of:
- forming at a repetitive frame time, a sequence of input frame patterns from and representing said audio signal, each frame pattern being associated with a frame time, successive frame patterns corresponding to successive dwell time positions,generating a numerical measure of the similarity of each said frame pattern with each of said target patterns,accumulating for each said target pattern required dwell time position and each said target pattern optional dwell time position, and using said numerical measure of the similarity of the just formed frame pattern and said each target pattern, a numerical value representing the alignment of the just formed frame pattern with the respective target pattern dwell time position, andgenerating a recognition decision, based upon said numerical values, when a predetermined sequence occurs in said audio signal.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein said accumulating step comprises the steps ofaccumulating for each target pattern second and later required dwell time position, and for each target pattern optional dwell time position, the sum of the accumulated score for the previous target pattern dwell time position during the previous frame time and the present numerical measure associated with the target pattern,accumulating, for each keyword first target pattern, first required dwell time position, the sum of the best accumulated score, during the previous frame time, which is associated with the end of a keyword, and the present numerical measure associated with the keyword first target pattern, andaccumulating, for each other target pattern first required dwell time position, the sum of the best ending accumulated score for the previous target pattern of the same keyword and the present numerical measure associated with the target pattern.
  - 3. The method of claim 2 further comprising the steps ofstoring in association with each frame time position, the identity and duration, in frame time position, of the keyword having best score and a valid ending at each said frame time position, andwherein said decision generating step comprises the step oftracing back through said stored keyword indentity and duration information for determining each keyword in a word string.
  - 4. The method of claim 3 further comprising the step ofstoring, in association with each dwell time position accumulated score, a word duration count corresponding to the time position length of the keyword associated with the accumulated score at the dwell time position.
  - 5. The method of claim 4 further comprising the step ofstoring, in association with each dwell time position accumulated score, a target pattern duration count corresponding to the position sequence of the dwell time position in the target pattern.
  - 6. The method of claim 1 wherein said decision generating and accumulating steps comprisedirecting the transfer of accumulated scores in response to a syntax generating element.

7. An apparatus for recognizing at least one keyword in an audio speech signal, each keyword being characterized by a template having at least one target pattern, each pattern representing at least two short term power spectra, and each target pattern having associated therewith at least two required dwell time positions and at least one optional dwell time position, the recognition apparatus comprising,means for forming, at a repetitive frame time rate, a sequence of input frame patterns from, and representing, said audio signal, each frame pattern corresponding to a said frame time, and successive frame patterns corresponding to successive dwell time positions,means for generating a numerical measure of the similarity of each said frame pattern with each of said target patterns,means for accumulating, for each said target pattern required dwell time position and each said target pattern optional dwell time position, and using said numerical measure of the similarity of the just formed frame pattern and said each target pattern, a numerical value representing the alignment of the just formed audio representing frame pattern with the respective target pattern dwell time position, andmeans for generating a recognition decision, based upon the accumulated numerical values, when a predetermined sequence occurs in said audio signal.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The apparatus of claim 7 further comprisingmeans for recognizing said predetermined sequence in said audio signal and for employing said predetermined sequence as a control signal.
  - 9. The apparatus of claim 8 wherein said predetermined sequence is a silence pattern.
  - 10. The apparatus according to claim 7 wherein said accumulating means comprisesfirst means for accumulating for each target pattern second and later required dwell time position and each target pattern optional dwell time position, the sum of the accumulated score for the previous target pattern dwell time position during the previous frame time and the present numerical measure associated with the target pattern,second means for accumulating, for each keyword first target pattern, first required dwell time position, the sum of the best accumulated score during the previous frame time which is associated with the end of a keyword, and the present numerical measure associated with the keyword first target pattern, andthird means for accumulating, for each other first target pattern, first required dwell time position, the sum of the best ending accumulated score for the previous target pattern of the same keyword and the present numerical measure associated with the target pattern.
  - 11. The apparatus according to claim 10 further comprisingmeans for storing in association with each frame time position, the identity and duration, in frame time positions, of the keyword having the best score and a valid ending at each said frame time position, andwherein said decision generating means comprisesmeans for tracing back through the stored keyword identity and duration information for identifying each keyword in a word string.
  - 12. The apparatus of claim 11 further comprisingmeans for storing in association with each dwell time position accumulated score, a word duration count corresponding to the time position length of the keyword associated with the accumulated score at the dwell time position.
  - 13. The apparatus of claim 12 further comprisingsecond means for storing, in association with each dwell time position accumulated score, a target pattern duration count corresponding to the time of the dwell time position in the target pattern.
  - 14. The method of claim 7 wherein the decision generating and accumulating means comprisemeans for directing the transfer of accumulated scores in response to a syntax generating element.

15. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, each target pattern representing at least two short-term power spectra, and each target pattern having associated therewith at least two required dwell time positions and at least one optional dwell time position, said dwell time positions defining the limits during which a said target pattern can match an incoming sequence of frame patterns, a method for forming said target patterns representing said keywords comprising the steps of:
- dividing an incoming audio signal corresponding to a keyword into a plurality of subintervals,forcing each subinterval to correspond to a unique target pattern,repeating said dividing and forcing steps upon a plurality of audio input signals representing the same keyword,generating statistics describing the target pattern associated with each subinterval, andmaking a second pass through said audio input signals representing said keyword, using said assembled statistics, for providing machine generated subintervals for said keywords.
- View Dependent Claims (16)
- - 16. The method of claim 15 wherein said subintervals are initially spaced uniformly from the beginning to the end of an audio input keyword.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbex Voice Systems, Inc. (Voxware, Inc.)
Original Assignee
Exxon Mobil Corporation
Inventors
Moshier, Stephen L.
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/309,208
Time in Patent Office

1,170 Days
Field of Search

179/1 SD, 179/1 SB, 364/513, 381/41-43
US Class Current

704/244
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Method and apparatus for continuous word string recognition

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

83 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for continuous word string recognition

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

83 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links