Systems, methods and articles of manufacture for performing high resolution N-best string hypothesization
First Claim
1. A speech recognition system comprising:
- means for receiving an input signal representing a speech utterance;
means for storing a plurality of recognition models having allophonic specifications wherein ones of said plurality of recognition models include one or more inter-word context dependent models and one or more language models; and
means for processing said input signal utilizing ones of said plurality of recognition models to generate one or more string hypotheses of said input signal, said processing means including;
means for producing a forward partial path map according to the allophonic specifications of at least one inter-word context dependent model and at least one language model; and
means for traversing said forward partial path map in the backward direction as a function of said allophonic specifications to generate said one or more string hypotheses.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are systems, methods and articles of manufacture for performing high resolution N-best string hypothesization during speech recognition. A received input signal, representing a speech utterance, is processed utilizing a plurality of recognition models to generate one or more string hypotheses of the received input signal. The plurality of recognition models preferably include one or more inter-word context dependent models and one or more language models. A forward partial path map is produced according to the allophonic specifications of at least one of the inter-word context dependent models and the language models. The forward partial path map is traversed in the backward direction as a function of the allophonic specifications to generate the one or more string hypotheses. One or more of the recognition models may represent one phone words.
-
Citations
35 Claims
-
1. A speech recognition system comprising:
-
means for receiving an input signal representing a speech utterance; means for storing a plurality of recognition models having allophonic specifications wherein ones of said plurality of recognition models include one or more inter-word context dependent models and one or more language models; and means for processing said input signal utilizing ones of said plurality of recognition models to generate one or more string hypotheses of said input signal, said processing means including; means for producing a forward partial path map according to the allophonic specifications of at least one inter-word context dependent model and at least one language model; and means for traversing said forward partial path map in the backward direction as a function of said allophonic specifications to generate said one or more string hypotheses. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for generating one or more string hypotheses from a received input signal, said received input signal representing a speech utterance, said method comprising the steps of:
-
utilizing ones of a plurality of recognition models having allophonic specifications to process said received input signal, said plurality of recognition models including at least one or more inter-word context dependent models and one or more language models; producing a search graph in a first direction according to the allophonic specifications of at least one of a particular! inter-word context dependent model and a particular! at least one language model; and traversing said search graph in a second direction as a function of said allophonic specifications to generate said one or more string hypotheses. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A storage medium which is readable by a processing system, said storage medium including a plurality of processing system instructions operable to direct said processing system to perform speech recognition, said storage medium comprising:
-
a first instruction set for storing a received input signal representing a speech utterance; a second instruction set for storing a plurality of recognition models having allophonic specifications wherein ones of said plurality of recognition models include one or more inter-word context dependent models and one or more language models; and a third instruction set for utilizing ones of said plurality of recognition models to produce a forward partial path map according to the allophonic specifications of at least one of said inter-word context dependent models and at least one of said language models, and to traverse said forward partial path map in the backward direction as a function of said allophonic specifications to generate one or more string hypotheses of said input signal. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
Specification