Speech recognition employing key word modeling and non-key word modeling
First Claim
1. A method of processing an input signal representing a spoken utterance, the spoken utterance having a key utterance component and an extraneous sound component, the method comprising the steps ofcomparing the input signal to a plurality of speech recognition models within a speech recognition system, said plurality of speech recognition models including key word speech recognition models representative of respective different key utterances and further including at least a first sink model, andrecognizing a particular one of said key utterances in said spoken utterance in response to said comparing,characterized in that said sink model represents a plurality of extraneous sound training tokens, at least two of said extraneous sound training tokens being other than repetitions of a particular one vocabulary item.
4 Assignments
0 Petitions
Accused Products
Abstract
Speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one type for defined vocabulary words (e.g., collect, calling-card, person, third-number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words (e.g. `I want to make a collect call please`). For this type of key word spotting, modifications are made to a connected word speech recognition algorithm based on state-transitional (hidden Markov) models which allow it to recognize words from a pre-defined vocabulary list spoken in an unconstrained fashion. Statistical models of both the actual vocabulary words and the extraneous speech and background noises are created. A syntax-driven connected word recognition system is then used to find the best sequence of extraneous input and vocabulary word models for matching the actual input speech.
152 Citations
22 Claims
-
1. A method of processing an input signal representing a spoken utterance, the spoken utterance having a key utterance component and an extraneous sound component, the method comprising the steps of
comparing the input signal to a plurality of speech recognition models within a speech recognition system, said plurality of speech recognition models including key word speech recognition models representative of respective different key utterances and further including at least a first sink model, and recognizing a particular one of said key utterances in said spoken utterance in response to said comparing, characterized in that said sink model represents a plurality of extraneous sound training tokens, at least two of said extraneous sound training tokens being other than repetitions of a particular one vocabulary item.
-
12. A speech recognition system for processing an input signal representing a spoken utterance, the spoken utterance having a key utterance component and an extraneous sound component, the speech recognition system comprising
means for comparing the input signal to a plurality of speech recognition models, said plurality of speech recognition models including speech recognition models representative of respective different key utterances and further including at least a first sink model, and means for recognizing a particular one of said key utterances in said spoken utterance in response to said comparing, characterized in that said sink model represents a plurality of extraneous sound training tokens, at least two of said extraneous sound training tokens being other than repetitions of a particular one vocabulary item.
Specification