Method and apparatus for adaptive speech recognition hypothesis construction and selection in a spoken language translation system

US 6,278,968 B1
Filed: 01/29/1999
Issued: 08/21/2001
Est. Priority Date: 01/29/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for performing spoken language translation, comprising:

receiving at least one speech input;

generating a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;

receiving information indicative of a best hypothesis from the plurality of recognition hypotheses;

adapting hypothesis generation in response to the best hypothesis; and

providing at least one output comprising the best hypothesis.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for adaptive speech recognition hypothesis are provided, wherein a number of ordered recognition hypotheses are generated and presented in response to a received speech input comprising natural spoken language. Generation of the recognition hypotheses comprises assigning basic probabilities to at least one basic component of the speech input using language models and calculating an overall probability of each of the recognition hypotheses using the assigned basic probabilities. The best hypothesis is selected by a user from the recognition hypotheses. Hypothesis generation is adapted in response to the selected best hypotheses, wherein the selected hypothesis is analyzed, a list comprising the basic components of the selected best hypothesis and the assigned basic probabilities is generated, credit is assigned to the basic components of the selected hypothesis by raising the assigned basic probabilities, and the basic probabilities of the language model are renormalized. An output is provided comprising the best hypothesis; moreover, the input is translated in response to the selected best hypothesis, and a synthesized translated speech output is provided.

Citations

44 Claims

1. A method for performing spoken language translation, comprising:
- receiving at least one speech input;
  
  generating a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
  
  receiving information indicative of a best hypothesis from the plurality of recognition hypotheses;
  
  adapting hypothesis generation in response to the best hypothesis; and
  
  providing at least one output comprising the best hypothesis.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising:
3. The method of claim 1, further comprising presenting a plurality of translations corresponding with the plurality of recognition hypotheses.
4. The method of claim 1, wherein the plurality of recognition hypotheses is ordered.
5. The method of claim 1, wherein the at least one speech input comprises spoken language comprising at least one source language.
6. The method of claim 1, wherein the potential variability is attributable to one or more of user accent, user pronunciation, volume level, microphone positions, and background noise.
7. The method of claim 1, further comprising:
- recognizing at least one word of the at least one speech input; and
  
  generating at least one word graph.
8. The method of claim 7, wherein recognizing at least one word comprises:
- using acoustic information comprising at least one word pronunciation dictionary and at least one acoustic model to generate at least one hypothesis for the at least one word; and
  
  generating at least one hypothesis for a position in time of the at least one word.
9. The method of claim 8, wherein generating the plurality of recognition hypotheses comprises generating at least one combination of hypotheses for the at least one speech input from the at least one hypothesis for the at least one word and the at least one hypothesis for a position in time of the at least one word.
10. The method of claim 1, wherein generating the plurality of recognition hyotheses comprises:
- assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
  
  calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities.
11. The method of claim 10, wherein the at least one language model comprises a probability selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
12. The method of claim 10, wherein the basic probabilities are based on grammatical functions selected from a group comprising subjects, verbs, and objects.
13. The method of claim 10, wherein adapting hypothesis generation comprises:
- analyzing the best hypothesis;
  
  generating a list comprising the at least one basic component of the best hypothesis and the assigned basic probability;
  
  assigning credit to the at least one basic component of the best hypothesis by raising the assigned basic probability; and
  
  renormalizing the basic probabilities of the at least one language model.

14. An apparatus for spoken language translation comprising:
- at least one processor (102);
  
  an input (125) coupled to the at least one processor, the input capable of receiving at least one speech input, the at least one processor configured to translate the at least one speech input by, generating (402) a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
  
  receiving information indicative of (1802) a best hypothesis from the plurality of recognition hypotheses;
  
  adapting (1810) hypothesis generation in response to the best hypothesis;
  
  an output (126) coupled to the at least one processor, the output capable of providing the best hypothesis.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The apparatus of claim 14, wherein the at least one processor is further configured to translate by:
16. The apparatus of claim 14, wherein the at least one processor is further configured to translate by presenting a plurality of translations corresponding with the the plurality of recognition hypotheses.
17. The apparatus of claim 14, wherein the plurality of recognition hypotheses is ordered.
18. The apparatus of claim 14, wherein the at least one speech input comprises spoken language comprising at least one source language.
19. The apparatus of claim 14, wherein the potential variability is attributable to one or more of user accent, user pronunciation, volume level, microphone positions, and background noise.
20. The apparatus of claim 14, wherein the at least one processor is further configured to translate by:
- recognizing at least one word of the at least one speech input; and
  
  generating at least one word graph.
21. The apparatus of claim 20, wherein recognizing at least one word comprises:
- using acoustic information comprising at least one word pronunciation dictionary and at least one acoustic model to generate at least one hypothesis for the at least one word; and
  
  generating at least one hypothesis for a position in time of the at least one word.
22. The apparatus of claim 21, wherein generating a plurality of recognition hypotheses comprises generating at least one combination of hypotheses for the at least one speech input from the at least one hypothesis for the at least one word and the at least one hypothesis for a position in time of the at least one word.
23. The apparatus of claim 14, wherein generating a plurality of recognition hypotheses comprises:
- assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
  
  calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities.
24. The apparatus of claim 23, wherein the at least one language model comprises a probability selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
25. The apparatus of claim 23, wherein the basic probabilities are based on grammatical functions comprising subjects, verbs, and objects, wherein the basic probability is of the form
26. The apparatus of claim 23, wherein adapting hypothesis generation comprises:
- analyzing the best hypothesis;
  
  generating a list comprising the at least one basic component of the best hypothesis and the assigned basic probability;
  
  assigning credit to the at least one basic component of the best hypothesis by raising the assigned basic probability; and
  
  renormalizing the basic probabilities of the at least one language model.

27. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform a method for spoken language translation, the method comprising:
- receiving (302) at least one speech input;
  
  generating (402) a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
  
  (1802) receiving information indicative of a best hypothesis from the plurality of recognition hypotheses;
  
  adapting (1810) hypothesis generation in response to the best hypothesis; and
  
  providing (306) at least one output comprising the best hypothesis.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 28. The computer readable medium of claim 27, wherein the method further comprises:
29. The computer readable medium of claim 27, wherein the method further comprises presenting a plurality of translations corresponding with the plurality of recognition hypotheses.
30. The computer readable medium of claim 27, wherein the plurality of recognition hypotheses is ordered.
31. The computer readable medium of claim 27, wherein the at least one speech input comprises spoken language comprising at least one source language.
32. The computer readable medium of claim 27, wherein the potential variability is attributable to one or more of user accent, user pronunciation, volume level, microphone positions, and background noise.
33. The computer readable medium of claim 27, wherein the method further comprises:
- recognizing at least one word of the at least one speech input; and
  
  generating at least one word graph.
34. The computer readable medium of claim 33, wherein recognizing at least one word comprises:
- using acoustic information comprising at least one word pronunciation dictionary and at least one acoustic model to generate at least one hypothesis for the at least one word; and
  
  generating at least one hypothesis for a position in time of the at least one word.
35. The computer readable medium of claim 34, wherein generating the plurality of recognition hypothese comprises generating at least one combination of hypotheses for the at least one speech input from the at least one hypothesis for the at least one word and the at least one hypothesis for a position in time of the at least one word.
36. The computer readable medium of claim 27, wherein generating the plurality of recognition hypotheses comprises:
- assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
  
  calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities.
37. The computer readable medium of claim 36, wherein the at least one language model comprises a language model selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
38. The computer readable medium of claim 36, wherein the basic probabilities are based on grammatical functions comprising subjects, verbs, and objects, wherein the basic probability is of the form
39. The computer readable medium of claim 36, wherein adapting hypothesis generation comprises:
- analyzing the best hypothesis;
  
  generating a list comprising the at least one basic component of the best hypothesis and the assigned basic probability;
  
  assigning credit to the at least one basic component of the best hypothesis by raising the assigned basic probability; and
  
  renormalizing the basic probabilities of the at least one language model.

40. A spoken language translation system, comprising:
- a means for receiving (302) at least one speech input;
  
  a means for generating (402) a plurality of recognition hypotheses in response to the at least one speech input taking into consideration a potential variability in the at least one speech input;
  
  a means for receiving information indicative of (1802) a best hypothesis from the plurality of recognition hypotheses;
  
  a means for adapting (1810) hypothesis generation in response to the best hypothesis; and
  
  a means for providing (306) at least one output comprising the best hypothesis.
- View Dependent Claims (41, 42, 43, 44)
- - 41. The system of claim 40, further comprising:
42. The system of claim 40, further comprising a means for presenting a plurality of translations corresponding with the plurality of recognition hypotheses.
43. The system of claim 40, wherein the means for generating the plurality of recognition hypotheses comprises:
- a means for assigning basic probabilities to at least one basic component of the at least one speech input using at least one language model; and
  
  a means for calculating an overall probability of each of the plurality of recognition hypotheses using the assigned basic probabilities, wherein the at least one language model comprises a probability selected from a group comprising an n-gram probability, wherein the n-gram probability comprises unigram probabilities, bigram probabilities, and trigram probabilities.
44. The system of claim 43, wherein the means for adapting hypothesis generation comprises:
- a means for analyzing the selected best hypothesis;
  
  a means for generating a list comprising the at least one basic component of the selected best hypothesis and the assigned basic probability;
  
  a means for assigning credit to the at least one basic component of the selected best hypothesis by raising the assigned basic probability; and
  
  a means for renormalizing the basic probabilities of the at least one language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Franz, Alexander M., Horiguchi, Keiko
Primary Examiner(s)
Thomas, Joseph

Application Number

US09/239,643
Time in Patent Office

935 Days
Field of Search

704/3, 704/2, 704/4, 704/5, 704/6, 704/7, 704/9, 704/8, 704/10, 704/1, 704/277, 704/251, 704/255, 704/256, 704/257, 704/270, 707/536, 707/530, 345/171
US Class Current

704/3
CPC Class Codes

G06F 40/279   Recognition of textual enti...

G06F 40/47   Machine-assisted translatio...

G06F 40/56   Natural language generation

G10L 15/075   supervised, i.e. under mach...

Method and apparatus for adaptive speech recognition hypothesis construction and selection in a spoken language translation system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for adaptive speech recognition hypothesis construction and selection in a spoken language translation system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links