Speech recognition employing key word modeling and non-key word modeling
First Claim
1. A method for generating a sink model for use in a speech recognition system, said method comprising the steps ofobtaining a first extraneous sound training token,obtaining a second extraneous sound training token, said first and second extraneous sound training tokens being other than repetitions of a particular one vocabulary item, andgenerating said sink model in response to a plurality of extraneous sound training tokens which includes said first and second extraneous sound training tokens.
4 Assignments
0 Petitions
Accused Products
Abstract
Speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one type for defined vocabulary words (e.g., collect, calling-card, person, third-number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words (e.g. `I want to make a collect call please`). For this type of key word spotting, modifications are made to a connected word speech recognition algorithm based on state-transitional (hidden Markov) models which allow it to recognize words from a pre-defined vocabulary list spoken in an unconstrained fashion. Statistical models of both the actual vocabulary words and the extraneous speech and background noises are created. A syntax-driven connected word recognition system is then used to find the best sequence of extraneous input and vocabulary word models for matching the actual input speech.
107 Citations
7 Claims
-
1. A method for generating a sink model for use in a speech recognition system, said method comprising the steps of
obtaining a first extraneous sound training token, obtaining a second extraneous sound training token, said first and second extraneous sound training tokens being other than repetitions of a particular one vocabulary item, and generating said sink model in response to a plurality of extraneous sound training tokens which includes said first and second extraneous sound training tokens.
Specification