Communication Device Having Speaker Independent Speech Recognition
First Claim
1. A method for performing speech recognition in a communication device with a voice dialing function, comprising:
- a) entering a speech recognition mode;
b) upon receipt of a voice input in the speech recognition mode, generating input feature vectors from voice input;
c) calculating a likelihood vector sequence from the input feature vectors indicating a likelihood in time of an utterance of phonetic units;
d) warping the likelihood vector sequence to phonetic word models;
e) calculating word model match likelihoods from the phonetic word models; and
f) determining a best matching one of the word model match as recognition result.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for performing speech recognition in a communication device with a voice dialing function is provided. Upon receipt of a voice input in a speech recognition mode, input feature vectors are generated from the voice input. Also, a likelihood vector sequence is calculated from the input feature vectors indicating the likelihood in time of an utterance of phonetic units. In a warping operation, the likelihood vector sequence is compared to phonetic word models and word model match likelihoods are calculated for that word models. After determination of a best-matching word model, the corresponding number to the name synthesized from the best-matching word model is dialed in a dialing operation.
-
Citations
36 Claims
-
1. A method for performing speech recognition in a communication device with a voice dialing function, comprising:
-
a) entering a speech recognition mode;
b) upon receipt of a voice input in the speech recognition mode, generating input feature vectors from voice input;
c) calculating a likelihood vector sequence from the input feature vectors indicating a likelihood in time of an utterance of phonetic units;
d) warping the likelihood vector sequence to phonetic word models;
e) calculating word model match likelihoods from the phonetic word models; and
f) determining a best matching one of the word model match as recognition result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An apparatus for performing speech recognition in a communication device with a voice dialing function, comprising:
-
a first memory configured to store word models of names in a phone book;
a vocoder configured to generate input feature vectors from a voice input in a speech recognition mode;
a speech recognition component including (a) a likelihood vector calculation device configured to calculate a likelihood vector sequence from the input feature vectors indicating a likelihood in time of an utterance of phonetic units, (b) a warper configured to warp the likelihood vector sequence to the word models, (c) a calculation device configured to calculate word model match likelihoods from the word models, and (d) a determining device configured to determine a best matching word model as a recognition result; and
a controller configured to initiate the speech recognition mode. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer program product comprising a computer useable medium having a computer program logic recorded thereon for controlling at least one processor, the computer program logic comprising:
-
computer program code means for entering a speech recognition mode;
computer program code means for generating input feature vectors from voice input upon receipt of a voice input in the speech recognition mode;
computer program code means for calculating a likelihood vector sequence from the input feature vectors indicating a likelihood in time of an utterance of phonetic units;
computer program code means for warping the likelihood vector sequence to phonetic word models;
computer program code means for calculating word model match likelihoods from the phonetic word models; and
computer program code means for determining a best matching one of the word model match as recognition result.
-
-
23. A memory device comprising computer program code, which when executed on a communication device enables the communication device to carry out a method comprising:
-
a) entering a speech recognition mode;
b) upon receipt of a voice input in the speech recognition mode, generating input feature vectors from voice input;
c) calculating a likelihood vector sequence from the input feature vectors indicating a likelihood in time of an utterance of phonetic units;
d) warping the likelihood vector sequence to phonetic word models;
e) calculating word model match likelihoods from the phonetic word models; and
f) determining a best matching one of the word model match as recognition result.
-
-
24. A computer-readable medium containing instructions for controlling at least one processor of a communications device, by a method comprising:
-
a) entering a speech recognition mode;
b) upon receipt of a voice input in the speech recognition mode, generating input feature vectors from voice input;
c) calculating a likelihood vector sequence from the input feature vectors indicating a likelihood in time of an utterance of phonetic units;
d) warping the likelihood vector sequence to phonetic word models;
e) calculating word model match likelihoods from the phonetic word models; and
f) determining a best matching one of the word model match as recognition result. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36)
-
-
32. The computer-readable medium controlling the processor using the method of claim, wherein the input feature vectors, the noise feature vector, the speaker characteristic adaptation vector, and the representative feature vectors are spectral vectors, the noise feature vector and the representative feature vectors have non-logarithmic components, and the input feature vectors and the speaker characteristic adaptation vector have logarithmic components, and updating the likelihood distribution comprises:
-
adding each of the representative feature vectors with the noise feature vector to generate first modified representative feature vectors;
logarithmizing each component of the first modified representative feature vectors;
adding to the first modified and logarithmized representative feature vectors the speaker characteristic adaptation vector to generate second modified representative feature vectors; and
determining a statistical distribution of the second modified representative feature vectors in feature space as likelihood distribution.
-
Specification