Method and apparatus for adaptive speech recognition hypothesis construction and selection in a spoken language translation system
First Claim
1. A method for performing spoken language translation, comprising:
- receiving at least one speech input;
generating a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
receiving information indicative of a best hypothesis from the plurality of recognition hypotheses;
adapting hypothesis generation in response to the best hypothesis; and
providing at least one output comprising the best hypothesis.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for adaptive speech recognition hypothesis are provided, wherein a number of ordered recognition hypotheses are generated and presented in response to a received speech input comprising natural spoken language. Generation of the recognition hypotheses comprises assigning basic probabilities to at least one basic component of the speech input using language models and calculating an overall probability of each of the recognition hypotheses using the assigned basic probabilities. The best hypothesis is selected by a user from the recognition hypotheses. Hypothesis generation is adapted in response to the selected best hypotheses, wherein the selected hypothesis is analyzed, a list comprising the basic components of the selected best hypothesis and the assigned basic probabilities is generated, credit is assigned to the basic components of the selected hypothesis by raising the assigned basic probabilities, and the basic probabilities of the language model are renormalized. An output is provided comprising the best hypothesis; moreover, the input is translated in response to the selected best hypothesis, and a synthesized translated speech output is provided.
-
Citations
44 Claims
-
1. A method for performing spoken language translation, comprising:
-
receiving at least one speech input;
generating a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
receiving information indicative of a best hypothesis from the plurality of recognition hypotheses;
adapting hypothesis generation in response to the best hypothesis; and
providing at least one output comprising the best hypothesis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
presenting the plurality of recognition hypotheses;
translating the at least one speech input in response to the best hypothesis; and
synthesizing at least one speech output in response to the translated at least one speech input.
-
-
3. The method of claim 1, further comprising presenting a plurality of translations corresponding with the plurality of recognition hypotheses.
-
4. The method of claim 1, wherein the plurality of recognition hypotheses is ordered.
-
5. The method of claim 1, wherein the at least one speech input comprises spoken language comprising at least one source language.
-
6. The method of claim 1, wherein the potential variability is attributable to one or more of user accent, user pronunciation, volume level, microphone positions, and background noise.
-
7. The method of claim 1, further comprising:
-
recognizing at least one word of the at least one speech input; and
generating at least one word graph.
-
-
8. The method of claim 7, wherein recognizing at least one word comprises:
-
using acoustic information comprising at least one word pronunciation dictionary and at least one acoustic model to generate at least one hypothesis for the at least one word; and
generating at least one hypothesis for a position in time of the at least one word.
-
-
9. The method of claim 8, wherein generating the plurality of recognition hypotheses comprises generating at least one combination of hypotheses for the at least one speech input from the at least one hypothesis for the at least one word and the at least one hypothesis for a position in time of the at least one word.
-
10. The method of claim 1, wherein generating the plurality of recognition hyotheses comprises:
-
assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities.
-
-
11. The method of claim 10, wherein the at least one language model comprises a probability selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
-
12. The method of claim 10, wherein the basic probabilities are based on grammatical functions selected from a group comprising subjects, verbs, and objects.
-
13. The method of claim 10, wherein adapting hypothesis generation comprises:
-
analyzing the best hypothesis;
generating a list comprising the at least one basic component of the best hypothesis and the assigned basic probability;
assigning credit to the at least one basic component of the best hypothesis by raising the assigned basic probability; and
renormalizing the basic probabilities of the at least one language model.
-
-
14. An apparatus for spoken language translation comprising:
-
at least one processor (102);
an input (125) coupled to the at least one processor, the input capable of receiving at least one speech input, the at least one processor configured to translate the at least one speech input by, generating (402) a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
receiving information indicative of (1802) a best hypothesis from the plurality of recognition hypotheses;
adapting (1810) hypothesis generation in response to the best hypothesis;
an output (126) coupled to the at least one processor, the output capable of providing the best hypothesis. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
presenting the plurality of recognition hypotheses;
translating the at least one speech input in response to the best hypothesis; and
synthesizing at least one speech output in response to the translated at least one speech input.
-
-
16. The apparatus of claim 14, wherein the at least one processor is further configured to translate by presenting a plurality of translations corresponding with the the plurality of recognition hypotheses.
-
17. The apparatus of claim 14, wherein the plurality of recognition hypotheses is ordered.
-
18. The apparatus of claim 14, wherein the at least one speech input comprises spoken language comprising at least one source language.
-
19. The apparatus of claim 14, wherein the potential variability is attributable to one or more of user accent, user pronunciation, volume level, microphone positions, and background noise.
-
20. The apparatus of claim 14, wherein the at least one processor is further configured to translate by:
-
recognizing at least one word of the at least one speech input; and
generating at least one word graph.
-
-
21. The apparatus of claim 20, wherein recognizing at least one word comprises:
-
using acoustic information comprising at least one word pronunciation dictionary and at least one acoustic model to generate at least one hypothesis for the at least one word; and
generating at least one hypothesis for a position in time of the at least one word.
-
-
22. The apparatus of claim 21, wherein generating a plurality of recognition hypotheses comprises generating at least one combination of hypotheses for the at least one speech input from the at least one hypothesis for the at least one word and the at least one hypothesis for a position in time of the at least one word.
-
23. The apparatus of claim 14, wherein generating a plurality of recognition hypotheses comprises:
-
assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities.
-
-
24. The apparatus of claim 23, wherein the at least one language model comprises a probability selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
-
25. The apparatus of claim 23, wherein the basic probabilities are based on grammatical functions comprising subjects, verbs, and objects, wherein the basic probability is of the form
-
26. The apparatus of claim 23, wherein adapting hypothesis generation comprises:
-
analyzing the best hypothesis;
generating a list comprising the at least one basic component of the best hypothesis and the assigned basic probability;
assigning credit to the at least one basic component of the best hypothesis by raising the assigned basic probability; and
renormalizing the basic probabilities of the at least one language model.
-
-
27. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform a method for spoken language translation, the method comprising:
-
receiving (302) at least one speech input;
generating (402) a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
(1802) receiving information indicative of a best hypothesis from the plurality of recognition hypotheses;
adapting (1810) hypothesis generation in response to the best hypothesis; and
providing (306) at least one output comprising the best hypothesis. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
presenting the plurality of recognition hypotheses;
translating the at least one speech input in response to the best hypothesis; and
synthesizing at least one speech output in response to the translated at least one speech input.
-
-
29. The computer readable medium of claim 27, wherein the method further comprises presenting a plurality of translations corresponding with the plurality of recognition hypotheses.
-
30. The computer readable medium of claim 27, wherein the plurality of recognition hypotheses is ordered.
-
31. The computer readable medium of claim 27, wherein the at least one speech input comprises spoken language comprising at least one source language.
-
32. The computer readable medium of claim 27, wherein the potential variability is attributable to one or more of user accent, user pronunciation, volume level, microphone positions, and background noise.
-
33. The computer readable medium of claim 27, wherein the method further comprises:
-
recognizing at least one word of the at least one speech input; and
generating at least one word graph.
-
-
34. The computer readable medium of claim 33, wherein recognizing at least one word comprises:
-
using acoustic information comprising at least one word pronunciation dictionary and at least one acoustic model to generate at least one hypothesis for the at least one word; and
generating at least one hypothesis for a position in time of the at least one word.
-
-
35. The computer readable medium of claim 34, wherein generating the plurality of recognition hypothese comprises generating at least one combination of hypotheses for the at least one speech input from the at least one hypothesis for the at least one word and the at least one hypothesis for a position in time of the at least one word.
-
36. The computer readable medium of claim 27, wherein generating the plurality of recognition hypotheses comprises:
-
assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities.
-
-
37. The computer readable medium of claim 36, wherein the at least one language model comprises a language model selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
-
38. The computer readable medium of claim 36, wherein the basic probabilities are based on grammatical functions comprising subjects, verbs, and objects, wherein the basic probability is of the form
-
39. The computer readable medium of claim 36, wherein adapting hypothesis generation comprises:
-
analyzing the best hypothesis;
generating a list comprising the at least one basic component of the best hypothesis and the assigned basic probability;
assigning credit to the at least one basic component of the best hypothesis by raising the assigned basic probability; and
renormalizing the basic probabilities of the at least one language model.
-
-
40. A spoken language translation system, comprising:
-
a means for receiving (302) at least one speech input;
a means for generating (402) a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
a means for receiving information indicative of (1802) a best hypothesis from the plurality of recognition hypotheses;
a means for adapting (1810) hypothesis generation in response to the best hypothesis; and
a means for providing (306) at least one output comprising the best hypothesis. - View Dependent Claims (41, 42, 43, 44)
a means for presenting the plurality of recognition hypotheses;
a means for translating the at least one speech input in response to the best hypothesis; and
a means for synthesizing at least one speech output in response to the translated at least one speech input.
-
-
42. The system of claim 40, further comprising a means for presenting a plurality of translations corresponding with the plurality of recognition hypotheses.
-
43. The system of claim 40, wherein the means for generating the plurality of recognition hypotheses comprises:
-
a means for assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
a means for calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities, wherein the at least one language model comprises a probability selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
-
-
44. The system of claim 43, wherein the means for adapting hypothesis generation comprises:
-
a means for analyzing the selected best hypothesis;
a means for generating a list comprising the at least one basic component of the selected best hypothesis and the assigned basic probability;
a means for assigning credit to the at least one basic component of the selected best hypothesis by raising the assigned basic probability; and
a means for renormalizing the basic probabilities of the at least one language model.
-
Specification