System and method for translating real-time speech using segmentation based on conjunction locations

US 9,734,820 B2
Filed: 11/14/2013
Issued: 08/15/2017
Est. Priority Date: 11/14/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion;

as the speech is being received, performing, via a processor, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and

upon identifying the conjunction;

(1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction;

(2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and

(3) receiving the second portion of the speech;

generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy;

when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech;

upon identifying the second conjunction;

(1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction;

(2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and

(3) receiving the new second portion of the speech; and

outputting the new translated speech segment.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer-readable storage device which balance latency and accuracy of machine translations by segmenting the speech upon locating a conjunction. The system, upon receiving speech, will buffer speech until a conjunction is detected. Upon detecting a conjunction, the speech received until that point is segmented. The system then continues performing speech recognition on the segment, searching for the next conjunction, while simultaneously initiating translation of the segment. Upon translating the segment, the system converts the translation to a speech output, allowing a user to hear an audible translation of the speech originally heard.

Citations

17 Claims

1. A method comprising:
- receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion;
  
  as the speech is being received, performing, via a processor, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and
  
  upon identifying the conjunction;
  
  (1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction;
  
  (2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and
  
  (3) receiving the second portion of the speech;
  
  generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy;
  
  when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech;
  
  upon identifying the second conjunction;
  
  (1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction;
  
  (2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and
  
  (3) receiving the new second portion of the speech; and
  
  outputting the new translated speech segment.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the conjunction comprises one of “
    - and” and
      
      “
      
      or”
      
      .
  - 3. The method of claim 2, wherein the speech recognition process comprises identifying punctuation using a punctuation classifier.
  - 4. The method of claim 1, further comprising repeating the segmenting, the performing, the generating, and the outputting with respect to the second portion of the speech.
  - 5. The method of claim 1, wherein the processor searches for fewer conjunctions when an accuracy is above a threshold.
  - 6. The method of claim 1, further comprising outputting additional translated speech immediately after the outputting of the translated speech such that an output break between the translated speech and the additional translated speech matches a pause in the speech corresponding to an input break between the first portion and the second portion of the speech.

7. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion;
  
  as the speech is being received, performing, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and
  
  upon identifying the conjunction;
  
  (1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction;
  
  (2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and
  
  (3) receiving the second portion of the speech;
  
  generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy;
  
  when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech;
  
  upon identifying the second conjunction;
  
  (1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction;
  
  (2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and
  
  (3) receiving the new second portion of the speech; and
  
  outputting the new translated speech segment.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the conjunction comprises one of “
    - and” and
      
      “
      
      or”
      
      .
  - 9. The system of claim 8, wherein the speech recognition process comprises identifying punctuation using a punctuation classifier.
  - 10. The system of claim 7, further comprising repeating the segmenting, the performing, the generating, and the outputting with respect to the second portion of the speech.
  - 11. The system of claim 7, wherein the processor searches for fewer conjunctions when an accuracy is above a threshold.
  - 12. The system of claim 7, the computer-readable storage medium having instructions stored which result in the operations further comprising outputting additional translated speech immediately after the outputting of the translated speech such that an output break between the translated speech and the additional translated speech matches a pause in the speech corresponding to an input break between the first portion and the second portion of the speech.

13. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion;
  
  as the speech is being received, performing, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and
  
  upon identifying the conjunction;
  
  (1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction;
  
  (2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and
  
  (3) receiving the second portion of the speech;
  
  generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy;
  
  when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech;
  
  upon identifying the second conjunction;
  
  (1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction;
  
  (2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and
  
  (3) receiving the new second portion of the speech; and
  
  outputting the new translated speech segment.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The computer-readable storage device of claim 13, wherein the conjunction comprises one of “
    - and” and
      
      “
      
      or”
      
      .
  - 15. The computer-readable storage device of claim 14, wherein the speech recognition process comprises identifying punctuation using a punctuation classifier.
  - 16. The computer-readable storage device of claim 13, further comprising repeating the segmenting, the performing, the generating, and the outputting with respect to the second portion of the speech.
  - 17. The computer-readable storage device of claim 13, wherein the computing device searches for fewer conjunctions when an accuracy is above a threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Rangarajan Sridhar, Vivek Kumar, Chen, John, Bangalore, Srinivas
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
CHAVEZ, RODRIGO A

Application Number

US14/080,361
Publication Number

US 20150134320A1
Time in Patent Office

1,370 Days
Field of Search

704 2, 704 4, 704 5, 704 6, 704 9, 704231, 704258
US Class Current
CPC Class Codes

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/58   Use of machine translation,...

G10L 13/00   Speech synthesis; Text to s...

G10L 15/005   Language recognition

G10L 15/04   Segmentation; Word boundary...

G10L 15/26   Speech to text systems G10L...

System and method for translating real-time speech using segmentation based on conjunction locations

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for translating real-time speech using segmentation based on conjunction locations

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links