Method and apparatus for continuous speech recognition using a layered, self-adjusting decoded network
First Claim
1. A system for recognizing speech, comprising:
- means for converting input speech into frames of speech data;
a dynamic network that receives said frames of speech data and establishes nodes that represent likelihood scores of various pre-defined models corresponding to the speech data of the respective frame;
a phone expanding network operating in parallel with said dynamic network, said a phone expanding network providing phone rules that govern which nodes of said dynamic network can be connected by arcs to which other nodes dependent upon said speech data;
a word network operating in parallel with said phone network and said dynamic network to provide word rules that govern which portions of the phone network correspond to recognizable words and which do not correspond to recognizable words;
said dynamic network, said phone network and said word network cooperating to process said speech data frames to recognize said input speech.
7 Assignments
0 Petitions
Accused Products
Abstract
A continuous speech decoder that is built up of multiple layers. Each of the layers uses independent knowledge sources and rules, but all the layers cooperate to quickly decode the speech input into words. A first layer is concerned with acoustic data, a second layer with phone data of speech and a third layer concerns word data and word sequences. By separating these layers, the higher layers can be made time independent and asynchronous. Thus the asynchronous layers can process data quickly and give fast support to the first layer which keeps a dynamic record called a dynamic network of the most likely continuous speech results. The speed and separation of this decoder allows better memory efficiency and better decoder results compared to previously known continuous speech decoders.
36 Citations
10 Claims
-
1. A system for recognizing speech, comprising:
-
means for converting input speech into frames of speech data;
a dynamic network that receives said frames of speech data and establishes nodes that represent likelihood scores of various pre-defined models corresponding to the speech data of the respective frame;
a phone expanding network operating in parallel with said dynamic network, said a phone expanding network providing phone rules that govern which nodes of said dynamic network can be connected by arcs to which other nodes dependent upon said speech data;
a word network operating in parallel with said phone network and said dynamic network to provide word rules that govern which portions of the phone network correspond to recognizable words and which do not correspond to recognizable words;
said dynamic network, said phone network and said word network cooperating to process said speech data frames to recognize said input speech. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A decoder for continuous speech recognition using a processor and a memory having a plurality of memory locations, the decoder comprising:
-
a speech framer for regularly processing input speech into consecutive frames of acoustic data;
a word network process for storing and applying language rules;
a phone network process for storing and applying phone rules; and
a dynamic programming network process for building a network of nodes connected by arcs which provide possible decodings of said input speech, said dynamic programming network process uses information from said word network process and said phone network process to direct the building of the nodes and their connections to previous nodes by arcs. - View Dependent Claims (9, 10)
-
Specification