Use of annotations in statistical machine translation
First Claim
Patent Images
1. A method for translating a document, comprising:
- defining translation rules associated with annotations occurring in a translation memory database, wherein the translation rules comprise an output context;
integrating the translation rules in the translation memory database into statistical machine translation used by a translation engine, the integration including training the translation engine to recognize annotations in the translation memory database while performing statistical machine translation;
training the statistical machine translation of the translation engine using both the translation memory database and parallel segments to modify statistical probability values;
storing the statistical probability values modified during the training, the stored values for use in the statistical machine translation by the translation engine;
receiving a source document in a source language, the source document comprising a string and an annotated segment of text, the annotated segment of text comprising text and an instance of an annotation;
translating the source document as a whole using the translation engine without breaking the source document into segments that are translated according to whether a match appears in the translation memory database, the translation engine configured for;
identifying annotations and associated annotated segments of the text in the source document to be processed using the translation rules, and strings in the source document to be translated using statistical machine translation,translating the identified strings in the source document using the statistical machine translation, andprocessing the identified instances of the annotations and the associated annotated segments of text according to the translation rules using the translation memory database, at least one of the translation rules being associated with the annotation; and
generating a target document in a target language based on the translated source document and the translated text associated with the annotation.
3 Assignments
0 Petitions
Accused Products
Abstract
A method, system, and computer readable medium for translating a document is provided. A statistical machine translation engine is trained using a translation memory comprising an annotation. A translation rule associated with the annotation is defined. A source document in a source language is received. The source document comprises an instance of the annotation and a string. The string is translated using the statistical machine translation engine. The instance of the annotation is processed according to the translation rule. A target document in a target language is generated based on the translated string and the processed annotation.
-
Citations
46 Claims
-
1. A method for translating a document, comprising:
-
defining translation rules associated with annotations occurring in a translation memory database, wherein the translation rules comprise an output context; integrating the translation rules in the translation memory database into statistical machine translation used by a translation engine, the integration including training the translation engine to recognize annotations in the translation memory database while performing statistical machine translation; training the statistical machine translation of the translation engine using both the translation memory database and parallel segments to modify statistical probability values; storing the statistical probability values modified during the training, the stored values for use in the statistical machine translation by the translation engine; receiving a source document in a source language, the source document comprising a string and an annotated segment of text, the annotated segment of text comprising text and an instance of an annotation; translating the source document as a whole using the translation engine without breaking the source document into segments that are translated according to whether a match appears in the translation memory database, the translation engine configured for; identifying annotations and associated annotated segments of the text in the source document to be processed using the translation rules, and strings in the source document to be translated using statistical machine translation, translating the identified strings in the source document using the statistical machine translation, and processing the identified instances of the annotations and the associated annotated segments of text according to the translation rules using the translation memory database, at least one of the translation rules being associated with the annotation; and generating a target document in a target language based on the translated source document and the translated text associated with the annotation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system for translating a document, comprising:
-
defined translation rules associated with annotations occurring in a translation memory database, wherein the translation rules comprise an output context; a communications interface configured to receive a source document in a source language, the source document comprising a string and an annotated segment of text, the annotated segment of text comprising text and an instance of an annotation; a statistical machine translation engine trained for statistical machine translation using both parallel segments, the annotation integrated into the parallel segments, and a translation memory comprising the annotation, to modify statistical probability values used in the statistical machine translation engine, the modified statistical probability values stored for use in the statistical machine translation by the statistical machine translation engine; a translation memory database including translation rules integrated into the statistical machine translation used by the statistical machine translation engine, the integration including training the statistical translation engine to recognize the annotations in the translation memory while performing the statistical machine translation; and a processor configured to translate the source document as a whole using the statistical machine translation engine without breaking the source document into segments that are translated according to whether a match appears in the translation memory database, the statistical machine translation engine configured to; identify the annotations and the associated annotated segments of the text in the source document to be processed using the translation rules, and the strings in the source document to be translated using statistical machine translation, translate the identified strings in the source document using statistical machine translation, process the identified instances of the annotations and the associated annotated segments of text according to the translation rules in the translation memory database, at least one of the translation rules being associated with the annotation, and generate a target document in a target language based on the translated string and the translated text associated with the annotation. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A non-transitory computer readable storage medium having embodied thereon a computer program having instructions executable by a computer to perform a method for translating a document, the method comprising the steps:
-
defining translation rules associated with annotations occurring in a translation memory database, wherein the translation rules comprise an output context; integrating the translation rules in the translation memory database into statistical machine translation used by a translation engine, the integration including training the translation engine to recognize annotations in the translation memory database while performing statistical machine translation; training the statistical machine translation of the translation engine using both the translation memory and parallel segments to modify statistical probability values; storing the statistical probability values modified during the training, the stored values for use in the statistical machine translation by the translation engine; receiving a source document in a source language, the source document comprising a string and an annotated segment of text, the annotated segment of text comprising text and an instance of an annotation; translating the source document as a whole using the translation engine without breaking the source document into segments that are translated according to whether a match appears in the translation memory database, the translation engine configured for; identifying annotations and associated annotated segments of the text in the source document to be processed using the translation rules, and strings in the source document to be translated using the statistical machine translation, translating the identified strings in the source document using the statistical machine translation, and processing the identified instances of the annotations and the associated annotated segments of text according to the translation rules using the translation memory database, at least one of the translation rules being associated with the annotation; and generating a target document in a target language based on the translated string and the translated text associated with the annotation. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
Specification