Speech recognition dividing words into two portions for preliminary selection
First Claim
1. A speech recognition apparatus which converts inputted speech into a label for each predetermined time interval and performs speech recognition using label strings, said apparatus comprising:
- a first memory means for storing, for each word in a vocabulary, a probability of producing each label in a label set at an arbitrary time interval in a fixed length first portion of an utterance of said word;
a second memory means for storing, for each word in said vocabulary, a probability of producing each label in said label set at an arbitrary time interval in a second portion following said first portion of the utterance of said word;
means for determining, upon the generation of a label for an inputted speech to be recognized, whether the label belongs to said first portion or said second portion;
means for outputting, when the generated label for said inputted speech belongs to said first portion, the probability of producing the label concerned at an arbitrary time interval in the first portion of the utterance of each word in said vocabulary wit reference to said first memory means;
means for outputting, when the generated label for said inputted speech belongs to said second portion, the probability of producing the label concerned at an arbitrary time interval in the second portion of the utterance of each word in said vocabulary with reference to said second memory means;
means for accumulating the probabilities outputted for each word;
means for specifying at least one candidate word in accordance with the magnitude of the accumulated value; and
means for performing detailed recognition for each of the specified candidate words.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition apparatus makes a preliminary selection of a number of candidate words from a vocabulary of words, one of which candidate words is most likely the spoken word to be recognized. For the preliminary selection, each candiate word is divided into first and second portions. For each portion of a word, there are stored probabilites of producing each label of a label alphabet during the utterance of that portion of the word. The speech to be recognized is also divided into first and second portions. A label string representing the speech to be recognized is generated, such that labels occur during the first or the second portion of the speech (or during a transition between the first and second portions. To determine the likelihood that the spoken word represents a word from the vocabulary, each label occurring during the first portion is assigned its "first portion" probability. Each label occurring during the second portion is assigned its "second portion" probability.
-
Citations
8 Claims
-
1. A speech recognition apparatus which converts inputted speech into a label for each predetermined time interval and performs speech recognition using label strings, said apparatus comprising:
-
a first memory means for storing, for each word in a vocabulary, a probability of producing each label in a label set at an arbitrary time interval in a fixed length first portion of an utterance of said word; a second memory means for storing, for each word in said vocabulary, a probability of producing each label in said label set at an arbitrary time interval in a second portion following said first portion of the utterance of said word; means for determining, upon the generation of a label for an inputted speech to be recognized, whether the label belongs to said first portion or said second portion; means for outputting, when the generated label for said inputted speech belongs to said first portion, the probability of producing the label concerned at an arbitrary time interval in the first portion of the utterance of each word in said vocabulary wit reference to said first memory means; means for outputting, when the generated label for said inputted speech belongs to said second portion, the probability of producing the label concerned at an arbitrary time interval in the second portion of the utterance of each word in said vocabulary with reference to said second memory means; means for accumulating the probabilities outputted for each word; means for specifying at least one candidate word in accordance with the magnitude of the accumulated value; and means for performing detailed recognition for each of the specified candidate words.
-
-
2. A speech recognition apparatus which converts inputted speech into a label for each predetermined time interval and performs speech recognition using label strings, said apparatus comprising:
-
means for accumulating, upon the generation of a label for a training utterance of each word in a vocabulary, a first and a second weight to determine the first and second statistical values of the label concerned, said first and second weights being functions of a time interval from a front edge of the utterance to the generation of the label concerned; means for normalizing the first and second statistical values of each label in a label set for each word in said vocabulary; a first memory means for storing the normalized first statistical value of each label in said label set for each word in said vocabulary as the probability of producing the label concerned in said label set at an arbitrary time interval in a fixed length first portion of the utterance of the word; a second memory means for storing the normalized second statistical value of each label in said label set for each word in said vocabulary as the probability of producing the label concerned in said label set at an arbitrary time interval in a second portion following said first portion of the utterance of the word; means for determining whether a label generated for an inputted speech to be recognized belongs to said first portion or said second portion; means for outputting, when the generated label for said inputted speech belongs to said first portion, the probability of producing the label concerned at an arbitrary time interval in the first portion of the utterance of each word in said vocabulary with reference to said first memory means; means for outputting, when the generated label for said inputted speech belongs to said second portion, the probability of producing the label concerned at an arbitrary time interval in the second portion of the utterance of each word in said vocabulary with reference to said second memory means; means for accumulating the probabilities outputted for each word; means for specifying at least one candidate word in accordance with the magnitude of the accumulated value; and means for performing detailed recognition processing for each of the specified candidate words. - View Dependent Claims (3, 4)
-
-
5. A speech recognition apparatus comprising:
-
acoustic means for receiving an utterance and producing label signals in response to the utterance, said label signals being selected from a set of label signals; first memory means for storing, for each word k in a vocabulary and for each label signal i in the set of label signals, a signal P1 (k, i) representing the probability of producing the label signal i in a first portion of an utterance of the word k; second memory means for storing, for each word k in the vocabulary and for each label signal i in the set of label signals, a signal P2 (k, i) representing the probability of producing the label signal i in a second portion of an utterance of the word k following the first portion of the utterance of the word; means for selecting, from the label signals produced by the acoustic means, a series of label signals representing the utterance of an inputted speech to be recognized, said inputted speech having a first portion and a second portion following the first portion, each label signal corresponding to the first portion or the second portion of the inputted speech; means for outputting probability signals P1 (k, i) from the first memory means for label signals corresponding to the first portion of the utterance of the inputted speech to be recognized for each word k in the vocabulary; means for outputting probability signals P2 (k, i) from the second memory means for label signals corresponding to the second portion of the utterance of the inputted speech to be recognized for each word k in the vocabulary; means for accumulating the output probability signals for each word k to produce a likelihood signal for each word, each likelihood signal having a magnitude; and means for selecting a candidate word in accordance with the magnitude of the likelihood signals and producing a word output signal representing the candidate word. - View Dependent Claims (6, 7, 8)
-
Specification