Speech translation with back-channeling cues

US 9,070,363 B2
Filed: 01/18/2010
Issued: 06/30/2015
Est. Priority Date: 10/26/2007
Status: Active Grant

First Claim

Patent Images

1. A method of translating speech from a first language to a second language, the method comprising:

recognizing speech by a speaker;

identifying the speech by the speaker as being in the first language;

initiating a translation of the speech in the first language, by a speech translation system, into the second language;

recognizing, by the speech translation system, one or more prosodic cues in the speech in the first language, one or more of the prosodic cues being of a specific type of prosodic cue;

responsive to recognizing the prosodic cues, producing a back-channel cue corresponding to the specific type of prosodic cue;

providing, by the speech translation system, the produced back-channel cue to the speaker, the back-channel cue comprising an audible confirmation that initiation of the translation of the speech in the first language has occurred; and

determining a translation result in the second language.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A field maintainable class-based translation system and apparatus with components that ease use by linguistically untrained users is disclosed. The apparatus includes modules for recovering errors, extending and customizing language coverage and increasing the speed of effective communication.

Citations

25 Claims

1. A method of translating speech from a first language to a second language, the method comprising:
- recognizing speech by a speaker;
  
  identifying the speech by the speaker as being in the first language;
  
  initiating a translation of the speech in the first language, by a speech translation system, into the second language;
  
  recognizing, by the speech translation system, one or more prosodic cues in the speech in the first language, one or more of the prosodic cues being of a specific type of prosodic cue;
  
  responsive to recognizing the prosodic cues, producing a back-channel cue corresponding to the specific type of prosodic cue;
  
  providing, by the speech translation system, the produced back-channel cue to the speaker, the back-channel cue comprising an audible confirmation that initiation of the translation of the speech in the first language has occurred; and
  
  determining a translation result in the second language.

2. The method of claim 1, wherein the produced back-channel cue further confirms that the translation of speech in the first language is currently working and uninterrupted.

3. The method of claim 1, wherein the recognized one or more prosodic cues comprises a pause in the speech by the speaker, the produced back-channel cue confirming that the translation of speech is in progress.

4. The method of claim 3, wherein the recognizing by the speech translation system of the one or more prosodic cues comprising a pause in the speech by the speaker adjusts sensitivity for detection of a break point beginning the pause dependent on a speech setting for the speech by the speaker.

5. The method of claim 4, wherein the speech setting is adjustable based on input provided by the speaker.

6. The method of claim 1, wherein the one or more prosodic cuesare selected from the group consisting of pauses, pitch contours, or intensity changes.

7. The method of claim 1, wherein the prosodic cues are selected from the group consisting of pauses and pitch contours.

8. A speech translation system, comprising:
- a processor;
  
  a speech recognition module that identifies sound comprising speech spoken in a first language by a speaker;
  
  a prosodic module that recognizes prosodic cues in the speech in the first language, one or more of the prosodic cues being of a specific type of prosodic cue;
  
  a speech synthesis module that produces, responsive to recognizing the prosodic cues, a back-channel cue corresponding to the specific type of prosodic cue and provides the produced back-channel cue to the speaker, the back-channel cue comprising an audible confirmation that initiation of the translation of the speech in the first language has occurred; and
  
  a translation module that translates and outputs, in a second language, the speech spoken in the first language by the speaker.

9. A computer program product comprising a non-transitory computer readable storage medium having instructions encoded thereon that, when executed by a processor, cause the processor to:
- recognize speech by a speaker;
  
  identify the speech by the speaker as being in the first language;
  
  initiate a translation of the speech in the first language, by a speech translation system, into the second language;
  
  recognize, by the speech translation system, one or more prosodic cues in the speech in the first language, one or more of the prosodic cues being of a specific type of prosodic cue;
  
  responsive to recognizing the prosodic cues, produce a back-channel cue corresponding to the specific type of prosodic cue;
  
  provide, by the speech translation system, the produced back-channel cue to the speaker, the back-channel cue comprising an audible confirmation that initiation of the translation of the speech in the first language has occurred; and
  
  determine a translation result in the second language.

10. The computer program product of claim 9, wherein the produced back-channel cue further confirms that the translation of speech in the first language is currently working and uninterrupted.

11. The computer program product of claim 9, wherein the recognized one or more prosodic cues comprises a pause in the speech by the speaker, the produced back-channel cue confirming that the translation of speech is in progress.

12. The computer program product of claim 11, wherein the recognizing by the speech translation system of the one or more prosodic cues comprising a pause in the speech by the speaker adjusts sensitivity for detection of a break point beginning the pause dependent on a speech setting for the speech by the speaker.

13. The computer program product of claim 9, wherein the one or more prosodic cues are selected from the group consisting of pauses, pitch contours, or intensity changes.

14. A method comprising:
- recognizing speech by a speaker;
  
  identifying the speech by the speaker as being in a first language;
  
  initiating a translation of the speech in the first language, by a speech translation system, into a second language;
  
  recognizing, by the speech translation system, one or more prosodic cues in the speech in the first language, one or more of the prosodic cues being of a specific type of prosodic cue;
  
  responsive to recognizing the prosodic cues, producing a back-channel cue corresponding to the specific type of prosodic cues;
  
  providing, responsive to recognizing a back-channel cue to the speaker, the back-channel cue comprising an audible confirmation that the speech translation system is ready to receive additional speech for translation;
  
  determining a translation result in the second language.

15. The method of claim 14, wherein the provided back-channel cue further confirms that the translation of speech in the first language is currently working and uninterrupted.

16. The method of claim 14, wherein the recognized one or more prosodic cues comprises a pause in the speech by the speaker, the provided back-channel cue confirming that the translation of speech is in progress.

17. The method of claim 16, wherein the recognizing by the speech translation system of the one or more prosodic cues comprising a pause in the speech by the speaker adjusts sensitivity for detection of a break point beginning the pause dependent on a speech setting for the speech by the speaker.

18. The method of claim 17, wherein the speech setting is adjustable based on input provided by the speaker.

19. The method of claim 14, wherein the one or more prosodic cues are selected from the group consisting of pauses, pitch contours, or and intensity changes.

20. The method of claim 17, wherein the prosodic cues are selected from the group consisting of pauses and pitch contours.

21. A computer program product comprising a non-transitory computer readable storage medium having instructions encoded thereon that, when executed by a processor, cause the processor to:
- recognizing speech by a speaker;
  
  identifying the speech by the speaker as being in a first language;
  
  initiating a translation of the speech in the first language, by a speech translation system, into a second language;
  
  recognizing, by the speech translation system, one or more prosodic cues in the speech in the first language, one or more of the prosodic cues being of a specific type of prosodic cue;
  
  responsive to recognizing the prosodic cues, producing a back-channel cue corresponding to the specific type of prosodic cues;
  
  providing, responsive to recognizing a back-channel cue to the speaker, the back-channel cue comprising an audible confirmation that the speech translation system is ready to receive additional speech for translation;
  
  determining a translation result in the second language.

22. The computer program product of claim 21, wherein the provided back-channel cue further confirms that the translation of speech in the first language is currently working and uninterrupted.

23. The computer program product of claim 21, wherein the recognized one or more prosodic cues comprises a pause in the speech by the speaker, the provided back-channel cue confirming that the translation of speech is in progress.

24. The computer program product of claim 21, wherein the recognizing by the speech translation system of the one or more prosodic cues comprising a pause in the speech by the speaker adjusts sensitivity for detection of a break point beginning the pause dependent on a speech setting for the speech by the speaker.

25. The computer program product of claim 21, wherein the one or more prosodic cues are selected from the group consisting of pauses, pitch contours, or intensity changes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Waibel, Alexander, Lane, Ian R.
Primary Examiner(s)
ROBERTS, SHAUN A

Application Number

US12/689,042
Publication Number

US 20100217582A1
Time in Patent Office

1,989 Days
Field of Search

704 1- 10, 704/277
US Class Current

1/1
CPC Class Codes

G06F 40/42   Data-driven translation

G06F 40/44   Statistical methods, e.g. p...

G10L 13/00   Speech synthesis; Text to s...

G10L 15/06   Creation of reference templ...

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0631   Creating reference template...

Speech translation with back-channeling cues

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Speech translation with back-channeling cues

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links