Context-dependent speech recognizer using estimated next word context
First Claim
1. A speech recognition apparatus comprising:
- means for generating a set of two or more speech hypotheses, each speech hypothesis comprising a partial hypothesis of zero or more words followed by a candidate word selected from a vocabulary of candidate words;
means for storing a set of word models, each word model representing one or more possible coded representations of an utterance of a word;
means for generating an initial model of each speech hypothesis, each initial model comprising a model of the partial hypothesis followed by a model of the candidate word;
an acoustic processor for generating a sequence of coded representations of an utterance to be recognized;
means for generating an initial hypothesis score for each speech hypothesis, each initial hypothesis score comprising an estimate of the closeness of a match between the initial model of the speech hypothesis and the sequence of coded representations of the utterance;
means for storing an initial subset of one or more speech hypotheses, from the set of speech hypotheses, having the best initial hypothesis scores;
next context estimating means for estimating, for each speech hypothesis in the initial subset, a likely word, from the vocabulary of words, which is likely to follow the speech hypothesis;
means for generating a revised model of each speech hypothesis in the initial subset, each revised model comprising a model of the partial hypothesis followed by a revised model of the candidate word, the revised candidate word model being dependent at least on the word which is estimated to be likely to follow the speech hypothesis;
means for generating a revised hypothesis score for each speech hypothesis in the initial subset, each revised hypothesis score comprising an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance;
means for storing a reduced subset of one or more speech hypotheses, from the initial subset of speech hypotheses, having the best revised match scores; and
means for outputting at least one word of one or more of the speech hypotheses in the reduced subset.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.
-
Citations
31 Claims
-
1. A speech recognition apparatus comprising:
-
means for generating a set of two or more speech hypotheses, each speech hypothesis comprising a partial hypothesis of zero or more words followed by a candidate word selected from a vocabulary of candidate words; means for storing a set of word models, each word model representing one or more possible coded representations of an utterance of a word; means for generating an initial model of each speech hypothesis, each initial model comprising a model of the partial hypothesis followed by a model of the candidate word; an acoustic processor for generating a sequence of coded representations of an utterance to be recognized; means for generating an initial hypothesis score for each speech hypothesis, each initial hypothesis score comprising an estimate of the closeness of a match between the initial model of the speech hypothesis and the sequence of coded representations of the utterance; means for storing an initial subset of one or more speech hypotheses, from the set of speech hypotheses, having the best initial hypothesis scores; next context estimating means for estimating, for each speech hypothesis in the initial subset, a likely word, from the vocabulary of words, which is likely to follow the speech hypothesis; means for generating a revised model of each speech hypothesis in the initial subset, each revised model comprising a model of the partial hypothesis followed by a revised model of the candidate word, the revised candidate word model being dependent at least on the word which is estimated to be likely to follow the speech hypothesis; means for generating a revised hypothesis score for each speech hypothesis in the initial subset, each revised hypothesis score comprising an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance; means for storing a reduced subset of one or more speech hypotheses, from the initial subset of speech hypotheses, having the best revised match scores; and means for outputting at least one word of one or more of the speech hypotheses in the reduced subset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A speech recognition method comprising:
-
generating a set of two or more speech hypotheses, each speech hypothesis comprising a partial hypothesis of zero or more words followed by a candidate word selected from a vocabulary of candidate words; storing a set of word models, each word model representing one or more possible coded representations of an utterance of a word; generating an initial model of each speech hypothesis, each initial model comprising a model of the partial hypothesis followed by a model of the candidate word; generating a sequence of coded representations of an utterance to be recognized; generating an initial hypothesis score for each speech hypothesis, each initial hypothesis score comprising an estimate of the closeness of a match between the initial model of the speech hypothesis and the sequence of coded representations of the utterance; storing an initial subset of one or more speech hypotheses, from the set of speech hypotheses, having the best initial hypothesis scores; estimating, for each speech hypothesis in the initial subset, a likely word, from the vocabulary of words, which is likely to follow the speech hypothesis; generating a revised model of each speech hypothesis in the initial subset, each revised model comprising a model of the partial hypothesis followed by a revised model of the candidate word, the revised candidate word model being dependent at least on the word which is estimated to be likely to follow the speech hypothesis; generating a revised hypothesis score for each speech hypothesis in the initial subset, each revised hypothesis score comprising an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance; storing a reduced subset of one or more speech hypotheses, from the initial subset of speech hypotheses, having the best revised match scores; and outputting at least one word of one or more of the speech hypotheses in the reduced subset. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification