Method and apparatus for time-synchronized translation and synthesis of natural-language speech

US 6,556,972 B1
Filed: 03/16/2000
Issued: 04/29/2003
Est. Priority Date: 03/16/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A system for translating a source language into at least one target language, comprising:

a phrase-spotting system for identifying a spoken phrase from a restricted domain of phrases;

a set of prerecorded translations of said restricted domain of phrases; and

a playback mechanism for reproducing said spoken phrase in said at least one target language, wherein a duration of said prerecorded translation is adjusted to approximately match a duration of said spoken phrase.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multi-lingual time-synchronized translation system and method provide automatic time-synchronized spoken translations of spoken phrases. The multi-lingual time-synchronized translation system includes a phrase-spotting mechanism, optionally, a language understanding mechanism, a translation mechanism, a speech output mechanism and an event measuring mechanism. The phrase-spotting mechanism identifies a spoken phrase from a restricted domain of phrases. The language understanding mechanism, if present, maps the identified phrase onto a small set of formal phrases. The translation mechanism maps the formal phrase onto a well-formed phrase in one or more target languages. The speech output mechanism produces high-quality output speech using the output of the event measuring mechanism for time synchronization. The event-measuring mechanism measures the duration of various key events in the source phrase. Event duration could be, for example, the overall duration of the input phrase, the duration of the phrase with interword silences omitted, or some other relevant durational features. The present invention recognizes the quality improvements can be achieved by restricting the task domain under consideration.

Citations

57 Claims

1. A system for translating a source language into at least one target language, comprising:
- a phrase-spotting system for identifying a spoken phrase from a restricted domain of phrases;
  
  a set of prerecorded translations of said restricted domain of phrases; and
  
  a playback mechanism for reproducing said spoken phrase in said at least one target language, wherein a duration of said prerecorded translation is adjusted to approximately match a duration of said spoken phrase.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein said phrase-spotting system measures the duration of said spoken phrase.
  - 3. The system of claim 1, wherein said phrase-spotting system measures the duration of internal events in said spoken phrase.
  - 4. The system of claim 3, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase less the duration of said internal specific events in said spoken phrase.
  - 5. The system of claim 3, wherein said internal events are inter-word pauses.
  - 6. The system of claim 1, wherein said duration is measured using a speech recognition system.
  - 7. The system of claim 1, wherein said phrase-spotting system is embodied as a speech recognition system.

8. A system for translating a source language into at least one target language, comprising:
- a phrase-spotting system for identifying a spoken phrase from a restricted domain of phrases, said restricted domain of phrases having a static component and a dynamic component;
  
  a set of prerecorded translations of said static components and said dynamic components of said restricted domain of phrases; and
  
  a playback mechanism for reproducing said spoken phrase in said at least one target language using said prerecorded translations of said static components and said dynamic components, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The system of claim 8, wherein said phrase-spotting system measures the duration of said spoken phrase.
  - 10. The system of claim 8, wherein said phrase-spotting system measures the duration of internal events in said spoken phrase.
  - 11. The system of claim 10, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase less the duration of said internal specific events in said spoken phrase.
  - 12. The system of claim 10, wherein said internal events are inter-word pauses.
  - 13. The system of claim 8, wherein said duration is measured using a speech recognition system.
  - 14. The system of claim 8, wherein said phrase-spotting system is embodied as a speech recognition system.
  - 15. The system of claim 8, wherein said playback mechanism employs a phrase-splicing based synthesis system.
  - 16. The system of claim 8, wherein a dynamic component of a recognized spoken phrase is converted to a dynamic component in said target language using a finite-state transducer.

17. A system for translating a source language into at least one target language, comprising:
- a natural-language understanding system that infers a phrase in an underlying formal language from a spoken phrase;
  
  a text production mechanism in which a formal language phrase is converted to natural text in said at least one target language;
  
  a set of prerecorded translations in said at least one target language; and
  
  a playback mechanism driven by said natural text for reproducing said spoken phrase in said at least one target language using said prerecorded translations, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The system of claim 17, wherein a speech recognition system measures the duration of said spoken phrase.
  - 19. The system of claim 17, wherein a speech recognition system measures the duration of internal events in said spoken phrase.
  - 20. The system of claim 19, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase less the duration of said internal specific events in said spoken phrase.
  - 21. The system of claim 19, wherein said internal events are inter-word pauses.
  - 22. The system of claim 17, wherein said playback mechanism employs a phrase-splicing based synthesis system.
  - 23. The system of claim 17, wherein a recognized spoken phrase is converted to said target language using a finite-state transducer.

24. A system for translating a source language into at least one target language, comprising:
- a phrase-spotting system for identifying a spoken phrase from a restricted domain of phrases;
  
  a set of prerecorded translations of said restricted domain of phrases; and
  
  a playback mechanism for reproducing said spoken phrase in said at least one target language, wherein the duration of said spoken phrase or said prerecorded translation is adjusted to synchronize said spoken phrase and said prerecorded translation.
- View Dependent Claims (25, 26)
- - 25. The system of claim 24, wherein said adjustment of said duration of said spoken phrase or said prerecorded translation is performed such that the maximum duration modification performed on either said spoken phrase or said prerecorded translation is less than a pre-determined threshold.
  - 26. The system of claim 25, wherein said adjustment of said duration of said spoken phrase or said prerecorded translation first determines whether said spoken phrase or said prerecorded translation has the shorter duration, and then increases the duration of said phrase with the shorter duration.

27. A method for translating a source language into at least one target language, comprising:
- identifying a spoken phrase from a restricted domain of phrases;
  
  obtaining a prerecorded translation of said spoken phrase; and
  
  reproducing said spoken phrase in said at least one target language, wherein a duration of said prerecorded translation is adjusted to approximately match a duration of said spoken phrase.
- View Dependent Claims (28, 29, 30, 31, 32)
- - 28. The method of claim 27, further comprising the step of measuring the duration of internal events in said spoken phrase.
  - 29. The method of claim 28, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase less the duration of said internal specific events in said spoken phrase.
  - 30. The method of claim 28, wherein said internal events are inter-word pauses.
  - 31. The method of claim 27, wherein said duration is measured using a speech recognition method.
  - 32. The method of claim 27, wherein said identifying step is performed by a speech recognition system.

33. A method for translating a source language into at least one target language, comprising:
- identifying a spoken phrase from a restricted domain of phrases, said restricted domain of phrases having a static component and a dynamic component;
  
  obtaining a prerecorded translation of said static components and said dynamic components of said spoken phrase; and
  
  reproducing said spoken phrase in said at least one target language using said prerecorded translations of said static components and said dynamic components, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40)
- - 34. The method of claim 33, further comprising the step of measuring the duration of internal events in said spoken phrase.
  - 35. The method of claim 34, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase less the duration of said internal specific events in said spoken phrase.
  - 36. The method of claim 34, wherein said internal events are inter-word pauses.
  - 37. The method of claim 33, wherein said duration is measured using a speech recognition method.
  - 38. The method of claim 33, wherein said identifying step is performed by a speech recognition system.
  - 39. The method of claim 33, wherein said reproducing step employs a phrase-splicing based synthesis system.
  - 40. The method of claim 33, wherein a dynamic component of a recognized spoken phrase is converted to a dynamic component in said target language using a finite-state transducer.

41. A method for translating a source language into at least one target language, comprising:
- inferring a phrase in an underlying formal language from a spoken phrase using a natural-language understanding system;
  
  converting a formal language phrase to natural text in said at least one target language;
  
  obtaining a prerecorded translation in said at least one target language; and
  
  reproducing said spoken phrase in said at least one target language using said prerecorded translations and driven by said natural text, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.
- View Dependent Claims (42, 43, 44, 45, 46, 47, 48)
- - 42. The method of claim 41, further comprising the step of measuring the duration of internal events in said spoken phrase.
  - 43. The method of claim 42, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase less the duration of said internal specific events in said spoken phrase.
  - 44. The method of claim 42, wherein said identifying step is performed by a speech recognition system.
  - 45. The method of claim 42, wherein said duration is measured using a speech recognition method.
  - 46. The method of claim 42, wherein a recognized spoken phrase is converted to said target language using a finite-state transducer.
  - 47. The method of claim 42, wherein said reproducing step employs a phrase-splicing based synthesis system.
  - 48. The method of claim 43, wherein said internal events are inter-word pauses.

49. A method for translating a source language into at least one target language, comprising:
- identifying a spoken phrase from a restricted domain of phrases;
  
  obtaining a prerecorded translation of said spoken phrase; and
  
  reproducing said spoken phrase in said at least one target language, wherein the duration of said spoken phrase or said prerecorded translation is adjusted to synchronize said spoken phrase and said prerecorded translation.
- View Dependent Claims (50, 51)
- - 50. The method of claim 49, wherein said adjustment of said duration of said spoken phrase or said prerecorded translation is performed such that the maximum duration modification performed on either said spoken phrase or said prerecorded translation is less than a pre-determined threshold.
  - 51. The method of claim 49, wherein said adjustment of said duration of said spoken phrase or said prerecorded translation first determines whether said spoken phrase or said prerecorded translation has the shorter duration, and then increases the duration of said phrase with the shorter duration.

52. A system for translating a source language into at least one of a plurality of target languages, comprising:
- a memory that stores computer-readable code; and
  
  a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
  
  identify a spoken phrase from a restricted domain of phrases;
  
  obtain a prerecorded translation of said spoken phrase; and
  
  reproduce said spoken phrase in said at least one target language, wherein a duration of said prerecorded translation is adjusted to approximately match a duration of said spoken phrase.

53. An article of manufacture, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to identify a spoken phrase from a restricted domain of phrases;
  
  a step to obtain a prerecorded translation of said spoken phrase; and
  
  a step to reproduce said spoken phrase in said at least one target language, wherein a duration of said prerecorded translation is adjusted to approximately match a duration of said spoken phrase.

54. A system for translating a source language into at least one of a plurality of target languages, comprising:
- a memory that stores computer-readable code; and
  
  a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
  
  identify a spoken phrase from a restricted domain of phrases, said restricted domain of phrases having a static component and a dynamic component;
  
  obtain a prerecorded translation of said static components and said dynamic components of said spoken phrase; and
  
  reproduce said spoken phrase in said at least one target language using said prerecorded translations of said static components and said dynamic components, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.

55. An article of manufacture, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to identify a spoken phrase from a restricted domain of phrases, said restricted domain of phrases having a static component and a dynamic component;
  
  a step to obtain a prerecorded translation of said static components and said dynamic components of said spoken phrase; and
  
  a step to reproduce said spoken phrase in said at least one target language using said prerecorded translations of said static components and said dynamic components, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.

56. A system for translating a source language into at least one of a plurality of target languages, comprising:
- a memory that stores computer-readable code; and
  
  a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
  
  infer a phrase in an underlying formal language from a spoken phrase using a natural-language understanding system;
  
  convert a formal language phrase to natural text in said at least one target language;
  
  obtain a prerecorded translation in said at least one target language; and
  
  reproduce said spoken phrase in said at least one target language using said prerecorded translations and driven by said natural text, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.

57. An article of manufacture, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to infer a phrase in an underlying formal language from a spoken phrase using a natural-language understanding system;
  
  a step to convert a formal language phrase to natural text in said at least one target language;
  
  a step to obtain a prerecorded translation in said at least one target language; and
  
  a step to reproduce said spoken phrase in said at least one target language using said prerecorded translations and driven by said natural text, wherein the duration of said prerecorded translation is adjusted to approximately match the duration of said spoken phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation, Oipenn Incorporated
Original Assignee
International Business Machines Corporation
Inventors
Novak, Miroslav, Bakis, Raimo, Meisel, William Stuart, Whitaker, Ridley M., Picheny, Michael, Epstein, Mark Edward
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/526,986
Time in Patent Office

1,139 Days
Field of Search

704/2, 704/3, 704/4, 704/8, 704/9, 704/231, 704/235, 704/256, 704/266, 704/277, 704/270
US Class Current

704/277
CPC Class Codes

G06F 40/58   Use of machine translation,...

G10L 15/18   using natural language mode...

G10L 2015/088   Word spotting

Method and apparatus for time-synchronized translation and synthesis of natural-language speech

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

57 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for time-synchronized translation and synthesis of natural-language speech

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

57 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links