Speech recognition method having noise immunity
First Claim
1. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and havingmeans for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, andmeans responsive to said acoustic parametersfor generating likelihood costs between said acoustic parameters and said speech template patterns, andfor processing said likelihood costs for determining the speech units in said speech input signal,a method for inhibiting a response to nonvocabulary utterances in a speech input for which template patterns have not been created, comprising the steps ofrepeatedly, at a frame repetition rate, generating acoustic parameters representing said speech input,generating likelihood costs at each frame time for said acoustic parameters and said template patterns, said template patterns including a pattern representing silence,beginning a normal speech recognition process whenever said cost for an active template pattern is better than a predetermined threshold value, andreverting to a non-speech recognition process whenever said cost of said template patterns, including silence, is worse than said predetermined threshold value.
5 Assignments
0 Petitions
Accused Products
Abstract
In a speech recognition system, the beginning of speech versus non-speech (a cough or noise) is distinguished by reverting to a non-speech decision process whenever the liklihood cost of template (vocabulary) patterns, including silence, is worse than a predetermined threshold, established by a Joker Word which represents a non-vocabulary word score and path in the grammar graph.
-
Citations
7 Claims
-
1. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and having
means for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, and means responsive to said acoustic parameters for generating likelihood costs between said acoustic parameters and said speech template patterns, and for processing said likelihood costs for determining the speech units in said speech input signal, a method for inhibiting a response to nonvocabulary utterances in a speech input for which template patterns have not been created, comprising the steps of repeatedly, at a frame repetition rate, generating acoustic parameters representing said speech input, generating likelihood costs at each frame time for said acoustic parameters and said template patterns, said template patterns including a pattern representing silence, beginning a normal speech recognition process whenever said cost for an active template pattern is better than a predetermined threshold value, and reverting to a non-speech recognition process whenever said cost of said template patterns, including silence, is worse than said predetermined threshold value.
-
4. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and having
means for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, and means responsive to said acoustic parameters for generating likelihood costs between said acoustic parameters and said speech template patterns, and for processing said likelihood costs for determining the speech units in said speech input signal, a method for inhibiting a response to nonvocabulary utterances in a speech input for which template patterns have not been created, comprising the steps of repeatedly, at a frame repetition rate, generating acoustic parameters representng said speech input, generating likelihood costs at each frame time for said acoutic parameters and said template patterns, said template patterns including a pattern representing silence, employing dynamic programming and a grammar graph for determining in response to said likelihood costs whether there has been a nonvocabulary utterance, said grammar graph having a normal speech recognition branch and a non-speech recognition branch, said non-speech recognition branch corresponding to nonvocabulary utterances for which said template patterns have not been created, and said employing step determining and selecting, using said dynamic programming, the better scoring of said speech recognition and said non-speech recognition branches.
Specification