System and method for translating real-time speech using segmentation based on conjunction locations
First Claim
Patent Images
1. A method comprising:
- receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion;
as the speech is being received, performing, via a processor, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and
upon identifying the conjunction;
(1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction;
(2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and
(3) receiving the second portion of the speech;
generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy;
when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech;
upon identifying the second conjunction;
(1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction;
(2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and
(3) receiving the new second portion of the speech; and
outputting the new translated speech segment.
3 Assignments
0 Petitions
Accused Products
Abstract
A system, method and computer-readable storage device which balance latency and accuracy of machine translations by segmenting the speech upon locating a conjunction. The system, upon receiving speech, will buffer speech until a conjunction is detected. Upon detecting a conjunction, the speech received until that point is segmented. The system then continues performing speech recognition on the segment, searching for the next conjunction, while simultaneously initiating translation of the segment. Upon translating the segment, the system converts the translation to a speech output, allowing a user to hear an audible translation of the speech originally heard.
-
Citations
17 Claims
-
1. A method comprising:
-
receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion; as the speech is being received, performing, via a processor, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and upon identifying the conjunction; (1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction; (2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and (3) receiving the second portion of the speech; generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy; when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech; upon identifying the second conjunction; (1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction; (2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and (3) receiving the new second portion of the speech; and outputting the new translated speech segment. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion; as the speech is being received, performing, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and upon identifying the conjunction; (1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction; (2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and (3) receiving the second portion of the speech; generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy; when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech; upon identifying the second conjunction; (1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction; (2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and (3) receiving the new second portion of the speech; and outputting the new translated speech segment. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving speech in a first language, the speech having no accompanying speech transcription, wherein the speech has a first portion, a second portion, and a conjunction which separates the first portion from the second portion; as the speech is being received, performing, a speech recognition process on the first portion of the speech until the conjunction is recognized by the speech recognition process; and upon identifying the conjunction; (1) segmenting the speech by generating a speech segment, the speech segment comprising the first portion of the speech to the conjunction; (2) performing a translation of the speech segment from the first language to a second language, to yield a translated speech segment; and (3) receiving the second portion of the speech; generating translated speech using the translated speech segment, wherein the translated speech is generated with an accuracy; when the accuracy is below a threshold, increasing segment lengths by reducing conjunctions searched for, increasing a maximum number of words per segment, and identifying a second conjunction which defines a new first portion of the speech and a new second portion of the speech; upon identifying the second conjunction; (1) segmenting the speech by generating a new speech segment, the new speech segment comprising the new first portion of the speech to the second conjunction; (2) performing a translation of the new speech segment from the first language to the second language, to yield a new translated speech segment; and (3) receiving the new second portion of the speech; and outputting the new translated speech segment. - View Dependent Claims (14, 15, 16, 17)
-
Specification