HYBRID ADAPTATION OF NAMED ENTITY RECOGNITION
First Claim
1. A machine translation method comprising:
- receiving a source text string in a source language;
identifying named entities in the source text string;
optionally, processing the identified named entities to exclude at least one of common nouns and function words from the named entities;
extracting features from the optionally processed source text string relating to the identified named entities;
with a processor, for at least one of the named entities, based on the extracted features, selecting a protocol for translating the source text string, the protocol being selected from a plurality of translation protocols,a first of the translation protocols including;
forming a reduced source string from the source text string in which the named entity is replaced by a placeholder;
translating the reduced source string by machine translation to generated a translated reduced target string,processing the named entity separately, andincorporating the processed named entity into the translated reduced target string to produce a target text string in the target language;
a second of the translation protocols including;
translating the source text string by machine translation, without replacing the named entity with the placeholder, to produce a target text string in the target language; and
outputting the target text string produced by the selected protocol.
1 Assignment
0 Petitions
Accused Products
Abstract
A machine translation method includes receiving a source text string and identifying any named entities. The identified named entities may be processed to exclude common nouns and function words. Features are extracted from the source text string relating to the identified named entities. Based on the extracted features, a protocol is selected for translating the source text string. A first translation protocol includes forming a reduced source string from the source text string in which the named entity is replaced by a placeholder, translating the reduced source string by machine translation to generate a translated reduced target string, while processing the named entity separately to be incorporated into the translated reduced target string. A second translation protocol includes translating the source text string by machine translation, without replacing the named entity with the placeholder. The target text string produced by the selected protocol is output.
-
Citations
23 Claims
-
1. A machine translation method comprising:
-
receiving a source text string in a source language; identifying named entities in the source text string; optionally, processing the identified named entities to exclude at least one of common nouns and function words from the named entities; extracting features from the optionally processed source text string relating to the identified named entities; with a processor, for at least one of the named entities, based on the extracted features, selecting a protocol for translating the source text string, the protocol being selected from a plurality of translation protocols, a first of the translation protocols including; forming a reduced source string from the source text string in which the named entity is replaced by a placeholder; translating the reduced source string by machine translation to generated a translated reduced target string, processing the named entity separately, and incorporating the processed named entity into the translated reduced target string to produce a target text string in the target language; a second of the translation protocols including; translating the source text string by machine translation, without replacing the named entity with the placeholder, to produce a target text string in the target language; and outputting the target text string produced by the selected protocol. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A machine translation system comprising:
-
a named entity recognition component for identifying named entities in an input source text string in a source language; optionally, a rule applying component which applies rules for processing the identified named entities to exclude at least one of common nouns and function words from the named entities; a feature extraction component for extracting features from the optionally processed source text string relating to the identified named entities; a prediction component for selecting a translation protocol for translating the source string based on the extracted features, the translation protocol being selected from a set of translation protocols including a first translation protocol in which the named entity is replaced by a placeholder to form a reduced source string, the reduced source string being translated separately from the named entity, and a second translation protocol in which the source text string is translated without replacing the named entity with the placeholder, to produce a target text string in the target language; and a machine translation component for performing the selected translation protocol; and a processor for implementing at least one of the components. - View Dependent Claims (15, 16)
-
-
17. A method for forming a machine translation system comprising:
-
optionally, providing rules for processing named entities identified in a source text string to exclude at least one of common nouns and function words from the named entities; with a processor, learning a prediction model for predicting a suitable translation protocol from a set of translation protocols for translating the optionally processed source text string, the learning comprising; for each of a training set of optionally processed source text strings; extracting features from the optionally processed source text strings relating to the identified named entities, and for each of the translation protocols, computing a translation score for a target text string generated by the translation protocol; and learning the prediction model based on the extracted features and translation scores; providing a prediction component which applies the model to features extracted from the optionally processed source text string to select one of the translation protocols. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification