Speech recognition method
First Claim
1. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and havingmeans for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, andmeans responsive to said acoustic parametersfor generating likelihood costs between said acoustic parameters and said speech template patterns, andfor processing said likelihood costs for determining the speech units in said speech input signal,a method of template matching and cost processing for recognizing the correspondence of said speech input signal and said template patterns, said method comprising the steps ofcharacterizing the allowable possible sequences of speech units as a grammer graph structure having a beginning node, an ending node and a plurality of intermediate nodes, all said nodes being connected by grammar arcs to at least one other node,initializing each said node with a high cumulative likelihood cost designating a bad score,generating likelihood costs representing the similarity of said acoustic parameters and selected ones of said template patterns,associating with each said node, at each frame time, a cumulative score corresponding to an accumulated template likelihood score in reaching said node, andgenerating a recognition decision when said cumulative score associated with the ending node is better than the cumulative score associated with any other node,storing a source representation of said grammar graph in a changeable memory of said responsive means,replacing said memory data with a representation of a second grammar graph, andgenerating a speech recognition decision based upon the second grammar graph,whereby said grammar source representing is software interchangeable and can be edited.
5 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. The template matching and cost processing circuitries provide distributed processing, on demand, of the acoustic parameters for generating through a dynamic programming technique the recognition decision. Grammar graphs, having a plurality of nodes, are employed for representing both sequences of speech keywords and the speech components which form a keyword. The grammar graphs are software interchangeable, and can be advantageously employed together with dynamic programming methods.
43 Citations
3 Claims
-
1. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and having
means for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, and means responsive to said acoustic parameters for generating likelihood costs between said acoustic parameters and said speech template patterns, and for processing said likelihood costs for determining the speech units in said speech input signal, a method of template matching and cost processing for recognizing the correspondence of said speech input signal and said template patterns, said method comprising the steps of characterizing the allowable possible sequences of speech units as a grammer graph structure having a beginning node, an ending node and a plurality of intermediate nodes, all said nodes being connected by grammar arcs to at least one other node, initializing each said node with a high cumulative likelihood cost designating a bad score, generating likelihood costs representing the similarity of said acoustic parameters and selected ones of said template patterns, associating with each said node, at each frame time, a cumulative score corresponding to an accumulated template likelihood score in reaching said node, and generating a recognition decision when said cumulative score associated with the ending node is better than the cumulative score associated with any other node, storing a source representation of said grammar graph in a changeable memory of said responsive means, replacing said memory data with a representation of a second grammar graph, and generating a speech recognition decision based upon the second grammar graph, whereby said grammar source representing is software interchangeable and can be edited.
-
3. In a speech recognition apparatus wherein speech units are each characterized by a sequence of template patterns, and having
means for processing a speech input signal for repetitively deriving therefrom, at a frame repetition rate, a plurality of speech recognition acoustic parameters, and means responsive to said acoustic parameters for generating likelihood costs between said acoustic parameters and said speech template patterns, and for processing said likelihood costs for determining the speech units in said speech input signal, a method of template matching and cost processing for recognizing the correspondence of said speech input signal and said template patterns, said method comprising the steps of characterizing the allowable possible sequences of speech units as a grammer graph structure having a beginning node, an ending node and a plurality of intermediate nodes, all said nodes being connected by grammar arcs to at least one other node, initializing each said node with a high cumulative likelihood cost designating a bad score, generating likelihood costs representing the similarity of said acoustic parameters and selected ones of said template patterns, associating with each said node, at each frame time, a cumulative score corresponding to an accumulated template likelihood score in reaching said node, and generating a recognition decision when said cumulative score associated with the ending node is better than the cumulative score associated with any other node, storing a source representation of said grammar graph in a changeable memory of said responsive means, replacing said memory data with a representation of a second grammar graph, and generating a speech recognition decision based upon the second grammar graph, generating said template patterns by characterizing an utterance as a grammar graph structure having a beginning node, an ending node, and a plurality of intermediate nodes, each node being connected by arcs to at least one other node, said arcs representing successive portions of said utterance, and generating said template patterns using dynamic programming and said utterance characterizing grammar graph, and wherein said grammar characterizations are software interchangeable and can be edited.
Specification