Continuous speech recognition

US 4,481,593 A
Filed: 10/05/1981
Issued: 11/06/1984
Est. Priority Date: 10/05/1981
Status: Expired due to Term

First Claim

Patent Images

1. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, a method for recognizing silence in the incoming audio signal comprising the steps of:

generating at least first and second target templates, each template representing, as a sequence of frequency spectrum representing parameters, an alternate description of silence in said incoming audio signal,comparing said incoming audio signal with each of said first and second target templates,generating a first and a second numerical measure representing the result of said comparisons respectively, anddeciding, based at least upon said numerical measures, whether silence has been detected.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An improved speech recognition method and apparatus for recognizing keywords in a continuous audio signal are disclosed. The keywords, generally either a word or a string of words, are each represented by an element template defined by a plurality of target patterns. Each target pattern is represented by a plurality of statistics describing the expected behavior of a group of spectra selected from plural short-term spectra generated by processing of the incoming audio. The incoming audio spectra are processed to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into multi-frame spectral patterns and are compared, using likelihood statistics, with the target patterns of the element templates. Each multi-frame pattern is forced to contribute to each of a plurality of pattern scores as represented by the element templates. The method and apparatus use speaker independent word models during the training stage to generate, automatically, improved target patterns. The apparatus and method further employ grammatical syntax during the training stage for identifying the beginning and ending boundaries of unknown keywords. Recognition is further improved by use of a plurality of templates representing "silence" or non-speech signals, for example, hum. Also, memory and computation load is reduced by use of modified (collapsed or folded) syntax flow graph logic, implemented by additional (augment) control numbers. A concatenation technique is employed, using dynamic programming techniques, to determine the correct identity of the word string.

121 Citations

19 Claims

1. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, a method for recognizing silence in the incoming audio signal comprising the steps of:
- generating at least first and second target templates, each template representing, as a sequence of frequency spectrum representing parameters, an alternate description of silence in said incoming audio signal,comparing said incoming audio signal with each of said first and second target templates,generating a first and a second numerical measure representing the result of said comparisons respectively, anddeciding, based at least upon said numerical measures, whether silence has been detected.

2. In a speech analysis apparatus for recognizing a plurality of keywords in an audio signal, each keyword being characterized by a template having at least one target pattern and each sequence of said keywords in said audio signal being described by a grammatical syntax, said syntax being characterized by a plurality of connected decision nodes, the recognition apparatus comprising:
- means for providing a sequence of numerical scores for recognizing keywords in said audio signal employing dynamic programming,means for employing said grammatical syntax for determining which scores form acceptable progressions in the recognition process, andmeans for using augments to preserve acceptable progressions whereby otherwise acceptable progressions are discarded according to said syntax.

3. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, a method for recognizing silence in said audio signal comprising the steps of:
- generating a numerical measure of likelihood that the present incoming audio signal portion corresponds to a reference pattern representing silence,effectively altering the numerical measure according to a syntax dependent determination, said syntax dependent determination representing the recognition of an immediately preceeding portion of the audio signal according to a grammatical syntax, anddetermining from the effectively altered measure whether the present signal portion corresponds to silence.

4. In a speech analysis apparatus for recognizing at least one spoken keywork in an audio signal, each keyword being characterized by a template having at least one target pattern, a method for forming reference patterns representing said spoken keywords and tailored to a speaker, comprising the steps of:
- providing speaker independent reference patterns representing said spoken keywords,determining beginning and ending boundaries of said keywords in audio signals spoken by said speaker using said speaker independent reference patterns, andtraining the speech analysis apparatus to said speaker using the beginning and ending boundaries determined by said apparatus for said keywords spoken by said speaker.
- View Dependent Claims (5)
- - 5. The method of claim 4 wherein the training step comprises the steps of:
    - dividing a keyword representing incoming audio signal from said speaker into a plurality of subintervals using said keyword boundaries,forcing each subinterval to correspond to a unique reference pattern,repeating said dividing and forcing steps upon a plurality of audio input signals representing the same keyword,generating statistics describing the reference pattern associated with each subinterval, andmaking a second pass through said audio input signals representing said keyword, using said assembled statistics, for providing machine generated subintervals for said keywords.

6. In a speech analysis apparatus for recognizing at least one spoken keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, a method for forming reference patterns representing a previously unknown keyword comprising the steps of:
- providing speaker independent reference patterns representing spoken keywords previously known to the apparatus,determining beginning and ending boundaries of said unknown keyword using said speaker independent reference patterns, andtraining the speech analysis apparatus, using the beginning and ending boundaries previously determined by said apparatus for said previously unknown keyword, to generate statistics describing said previously unknown keyword.
- View Dependent Claims (7, 8)
- - 7. The method of claim 6 further comprising the step ofproviding an audio signal representing said unknown keyword spoken by said speaker in isolation.
  - 8. The method of claim 6 wherein the training step comprises the steps of:
    - dividing an incoming audio signal corresponding to said previously unknown keyword into a plurality of subintervals using said boundaries,forcing each subinterval to correspond to a unique reference pattern,repeating said dividing and forcing steps upon a plurality of audio input signals representing the same keyword,generating statistics describing the reference pattern associated with each subinterval, andmaking a second pass through said audio input signals representing said previously unknown keyword, using said assembled statistics, for providing machine generated subintervals for said keyword.

9. In a speech analysis apparatus for recognizing a plurality of keywords in an audio signal, each keyword being characterized by a template having at least one target pattern and each sequence of said keywords in said audio signal being described by a grammatical syntax, said syntax being characterized by a plurality of connected decision nodes, the recognition method comprising the steps of:
- providing a sequence of numerical scores for recognizing keywords in said audio signal employing dynamic programming,employing said grammatical syntax for determining which scores form acceptable progressions in the recognition process, andreducing the number of decision nodes by collapsing said syntax whereby the computational load for the apparatus is reduced.

10. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, apparatus for recognizing silence in the incoming audio signal comprising:
- means for generating at least first and second target templates, each template representing, as a sequence of frequency spectrum representing parameters, an alternate description of silence in said incoming audio signal,means for comparing said incoming audio signal with each of said first and second target templates,means for generating a first and a second numerical measure representing the result of said comparisons respectively, andmeans for deciding, based at least upon said numerical measures, whether silence has been detected.

11. In a speech analysis apparatus for recognizing a plurality of keywords in an audio signal, each keyword being characterized by a template having at least one target pattern and each sequence of said keywords in said audio signal being described by a grammatical syntax, said syntax being characterized by a plurality of connected decision nodes, the recognition method comprising the steps of:
- providing a sequence of numerical scores for recognizing keywords in said audio signal employing dynamic programming,employing said grammatical syntax for determining which scores form acceptable progressions in the recognition process, andusing augments to preserve acceptable progressions whereby otherwise acceptable progressions are discarded according to said syntax.

12. In a speech analysis apparatus for recognizing at least one keyword in an audio signal, apparatus for recognizing silence in said audio signal comprising:
- means for generating a numerical measure of likelihood that the present incoming audio signal portion corresponds to a reference pattern representing silence,means for adding to the numerical measure a syntax dependent numerical value to form a score, said syntax dependent value representing the recognition of an immediately preceeding portion of the audio signal according to a grammatical syntax, andmeans for determining from the score whether the present signal portion corresponds to silence.

13. In a speech analysis apparatus for recognizing at least one spoken keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, apparatus for forming reference patterns representing said spoken keywords and tailored to a speaker comprising:
- means for providing speaker independent reference patterns representing said spoken keywords,means for determining beginning and ending boundaries of said keywords in audio signals spoken by said speaker using said speaker independent reference patterns, andmeans for training the speech analysis apparatus to said speaker using the beginning and ending boundaries determined by said apparatus for said keywords spoken by said speaker.
- View Dependent Claims (14)
- - 14. The apparatus of claim 13 wherein the training means comprises:
    - means for repetitively dividing a keyword representing incoming audio signal, from said speaker, corresponding to a keyword into a plurality of subintervals using said keyword boundaries,means for repetitively forcing each subinterval to correspond to a unique reference pattern,means for generating statistics describing the reference pattern associated with each subinterval, andmeans for making a second pass through said audio input signals representing said keyword, using said assembled statistics, for providing machine generated subintervals for said keywords.

15. In a speech analysis apparatus for recognizing at least one spoken keyword in an audio signal, each keyword being characterized by a template having at least one target pattern, apparatus for forming reference patterns representing a previously unknown keyword comprising:
- means for providing speaker independent reference patterns representing spoken keywords previously known to the apparatus,means for determining beginning and ending boundaries of said unknown keyword using said speaker independent reference patterns, andmeans for training the speech analysis apparatus using the beginning and ending boundaries previously determined by said apparatus for said unknown keyword to generate statistics describing said previously unknown keyword.
- View Dependent Claims (16, 18)
- - 16. The apparatus of claim 15 further comprising
  - 18. The apparatus of claim 15 wherein the training means comprises:
    - means for repetitively dividing an incoming audio signal corresponding to said previously unknown keyword into a plurality of subintervals using said boundaries,means for repetitively forcing each subinterval to correspond to a unique reference pattern,means for generating statistics describing the reference pattern associated with each subinterval, andmeans for making a second pass through said audio input signals representing said previously unknown keyword, using said assembled statistics, for providing machine generated subintervals for said keyword.

17. means for providing an audio signal representing said unknown keyword spoken by said speaker in isolation.

19. In a speech analysis apparatus for recognizing a plurality of keywords in an audio signal, each keyword being characterized by a template having at least one target pattern and each sequence of said keywords in said audio signal being described by a grammatical syntax, said syntax being characterized by a plurality of connected decision nodes, the recognition apparatus comprising:
- means for providing a sequence of numerical scores for recognizing keywords in said audio signal employing dynamic programming,means for employing said grammatical syntax for determining which scores form acceptable progressions in the recognition process, andmeans for reducing the number of decision nodes whereby the computational load for the apparatus is reduced.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbex Voice Systems, Inc. (Voxware, Inc.)
Original Assignee
Exxon Mobil Corporation
Inventors
Bahler, Lawrence G.
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/309,209
Time in Patent Office

1,128 Days
Field of Search

179/1 SB, 179/1 SD, 179/1 SC, 364/513, 364/513.5, 381/42-45, 382/33, 382/34, 382/37
US Class Current

704/253
CPC Class Codes

G10L 15/05   Word boundary detection

G10L 15/12   using dynamic programming t...

G10L 15/193   Formal grammars, e.g. finit...

G10L 2015/088   Word spotting

Continuous speech recognition

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

121 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Continuous speech recognition

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links