Speech recognition training method
First Claim
1. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and havingmeans for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, andmeans responsive to said acoustic parametersfor generating likelihood costs between said acoustic parameters and said speech template patterns, andfor processing said likelihood costs for determining the speech units in said speech input signal,a method for generating said template patterns comprising the steps offinding the beginning and end of an input speech unit surrounded by silence for which template patterns are to be generated, andgenerating in accordance with a known procedure, template patterns representing said speech unit,said finding step comprisingmodelling silence as a template pattern,for each frame, comparing said silence template pattern likelihood cost with a fixed reference threshold value, anddeclaring the beginning of said speech unit when the score for the silence template pattern crosses the threshold value.
6 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. Template pattern generation is advantageously aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation, by finding the beginning and ending of an utterance surrounded by silence.
-
Citations
7 Claims
-
1. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and having
means for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, and means responsive to said acoustic parameters for generating likelihood costs between said acoustic parameters and said speech template patterns, and for processing said likelihood costs for determining the speech units in said speech input signal, a method for generating said template patterns comprising the steps of finding the beginning and end of an input speech unit surrounded by silence for which template patterns are to be generated, and generating in accordance with a known procedure, template patterns representing said speech unit, said finding step comprising modelling silence as a template pattern, for each frame, comparing said silence template pattern likelihood cost with a fixed reference threshold value, and declaring the beginning of said speech unit when the score for the silence template pattern crosses the threshold value.
-
4. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and having
means for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, and means responsive to said acoustic parameters for generating likelihood costs between said acoustic parameters and said speech template patterns, and for processing said likelihood costs for determining the speech units in said speech input signal, a method for generating said template patterns comprising the steps of finding the beginning and end of an input speech unit surrounded by silence for which template patterns are to be generated, and generating in accordance with a known procedure, template patterns representing said speech unit, said finding step comprising modelling silence as a template pattern, for each frame, comparing said silence template pattern likelihood cost with a fixed reference threshold value, and declaring the beginning of said speech unit when the score for the silence template pattern crosses the threshold value, declaring the end of said speech unit when the score for the silence template improves sufficiently to cross a second threshold value, and wherein said second threshold value is less than said first threshold value.
-
5. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and having
means for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, and means responsive to said acoustic parameters for generating likelihood costs between said acoustic parameters and said speech template patterns, and for processing said likelihood costs for determining the speech units in said speech input signal, a method for generating said template patterns comprising the steps of finding, using a dynamic programming and a grammar graph, the beginning and end of an input speech unit surrounded by silence for which template patterns are to be generated, and generating in accordance with a known procedure, template patterns representing said speech unit, said finding step comprising modelling silence as a template pattern, associating from a beginning node of said grammar graph a first arc having a fixed reference threshold value, and associating with said beginning node a silence self loop, and following said dynamic programming for determining the beginning of said speech unit.
Specification