Domain adaptation for query translation
First Claim
1. A translation method comprising:
- receiving an input query in a source language;
translating the input query with a phrase-based statistical machine translation system to generate a set of candidate translations of the input query in a target language;
extracting a set of features from each of the candidate translations in the set, the set of features including at least one domain specific feature which is based on a comparison of at least one term in the candidate translation with words in a domain-specific corpus of documents;
scoring each of the candidate translations with a scoring function in which the extracted features are weighted with respective weights, the weights having been learned on;
features extracted from translated queries, each of the translated queries having been generated by translation of an original query from a set of original queries into the target language with a machine translation system, anda measure of information retrieval performance of each of the translated queries, for each original query in the set of original queries, the information retrieval performance of each translated query being based on a relevance score, with respect to the respective original query, for documents in a set of documents that have been retrieved in response to the translated query; and
outputting a target query in the target language based on the scores of the candidate translations.
1 Assignment
0 Petitions
Accused Products
Abstract
A translation system and method suited to use in Cross Language Information Retrieval employ a retrieval-based scoring function for reranking candidate translations. The method includes translating an input source language query to generate a set of the candidate translations in a target language. The candidate translations are scored with the scoring function, which allows them to be reranked, and an optimal one or more selected for use in querying a domain-specific collection of documents in the target language. The scoring function applies weights to features extracted from the candidate translations. The weights have been learned on features extracted from translated queries, each generated by translation of an original query into the target language, and a measure of information retrieval performance of each the translated queries. One or more of the features used is a domain-specific feature which relies on a corpus of documents in the domain of interest.
82 Citations
26 Claims
-
1. A translation method comprising:
-
receiving an input query in a source language; translating the input query with a phrase-based statistical machine translation system to generate a set of candidate translations of the input query in a target language; extracting a set of features from each of the candidate translations in the set, the set of features including at least one domain specific feature which is based on a comparison of at least one term in the candidate translation with words in a domain-specific corpus of documents; scoring each of the candidate translations with a scoring function in which the extracted features are weighted with respective weights, the weights having been learned on; features extracted from translated queries, each of the translated queries having been generated by translation of an original query from a set of original queries into the target language with a machine translation system, and a measure of information retrieval performance of each of the translated queries, for each original query in the set of original queries, the information retrieval performance of each translated query being based on a relevance score, with respect to the respective original query, for documents in a set of documents that have been retrieved in response to the translated query; and outputting a target query in the target language based on the scores of the candidate translations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A domain-specific translation method comprising:
-
receiving an input query in a source language; with a machine translation system that is not adapted to a specific domain, translating the query to generate a set of candidate translations of the query in a target language; extracting a set of features from each of the candidate translations in the set, the set of features including at least one domain specific feature which is based on a comparison of at least one term in the candidate translation with words in an associated domain-specific corpus of documents in the target language; scoring each of the candidate translations with a scoring function in which the extracted features are weighted with respective weights, the weights having been learned on features extracted from translated queries, each generated by translation of an original query into the target language, and a measure of information retrieval performance of each the translated queries for each original query in a set of original queries, the information retrieval performance being assessed on a domain-specific target document collection in which documents in the collection are annotated based on relevance to the original queries; and outputting a target query based on the scores of the candidate translations, wherein the at least one domain-specific feature is selected from the group consisting of; a) a language model feature; b) an out of vocabulary word feature; c) a query performance predictor which is computed with an equation that correlates with the measure of information retrieval performance; and combinations thereof. - View Dependent Claims (17, 18)
-
-
19. A translation method comprising:
-
receiving an input query in a source language; with a machine translation system, translating the query to generate a set of candidate translations of the query in a target language; extracting a set of features from each of the candidate translations in the set, the set of features including at least one domain specific feature which is based on a comparison of at least one term in the candidate translation with words in a domain-specific corpus of documents; scoring each of the candidate translations with a scoring function in which the extracted features are weighted with respective weights, the weights having been learned on features extracted from translated queries, each generated by translation of an original query into the target language, and a measure of information retrieval performance of each the translated queries, for each original query in a set of original queries; and outputting a target query based on the scores of the candidate translations, wherein the at least one domain-specific feature comprises a query performance predictor which, for a candidate translation, is based on at least one of; a) Average Inverse Document frequency, computed according to the expression;
-
-
20. A computer program product comprising a non-transitory computer-readable recording medium which stores instructions for performing a translation method, comprising:
-
receiving an input query in a source language; translating the input query with a phrase-based statistical machine translation system to generate a set of candidate translations of the input query in a target language; extracting a set of features from each of the candidate translations in the set, the set of features including at least one domain specific feature which is based on a comparison of at least one term in the candidate translation with words in a domain-specific corpus of documents; scoring each of the candidate translations with a scoring function in which the extracted features are weighted with respective weights, the weights having been learned on; features extracted from translated queries, each of the translated queries having been generated by translation of an original query from a set of original queries into the target language with a machine translation system, and a measure of information retrieval performance of each of the translated queries, for each original query in the set of original queries, the information retrieval performance of each translated query being based on a relevance score, with respect to the respective original query, for documents in a set of documents that have been retrieved in response to the translated query; and outputting a target query in the target language based on the scores of the candidate translations.
-
-
21. A translation system comprising non-transitory memory which stores instructions for translating an input source language query to generate a set of the candidate translations in a target language and a processor in communication with the memory for executing the instructions, comprising:
-
receiving an input query in a source language; translating the input query with a phrase-based statistical machine translation system to generate a set of candidate translations of the input query in a target language; extracting a set of features from each of the candidate translations in the set, the set of features including at least one domain specific feature which is based on a comparison of at least one term in the candidate translation with words in a domain- specific corpus of documents; scoring each of the candidate translations with a scoring function in which the extracted features are weighted with respective weights, the weights having been learned on; features extracted from translated queries, each of the translated queries having been generated by translation of an original query from a set of original queries into the target language with a machine translation system, and a measure of information retrieval performance of each of the translated queries, for each original query in the set of original queries, the information retrieval performance of each translated query being based on a relevance score, with respect to the respective original query, for documents in a set of documents that have been retrieved in response to the translated query; and outputting a target query in the target language based on the scores of the candidate translations.
-
-
22. A query translation system comprising:
-
a statistical machine translation system including a decoder which receives a source query in a source language and outputs a set of candidate translations in a target language using biphrases extracted from a biphrase library, each of the candidate translations being a translation of the same source query; and a reranking component which outputs a target query in the target language based on at least one of the candidate translations, the reranking component extracting features of each of the candidate translations and computing a function in which the extracted features are weighted by feature weights, the weights having been learned on features of each of a set of translated queries generated by translation of an original query into the target language and a measure of information retrieval performance of each the translated queries from a collection of domain-specific documents in which documents in the collection are annotated based on relevance to original queries, for each original query in a set of original queries, at least one of the features comprising a domain-specific feature; and a processor which implements the reranking component. - View Dependent Claims (23, 24, 25)
-
-
26. A method for training a translation system for domain-adapted translation of queries, comprising:
-
for each of a set of original queries in a source language; translating the query to generate a set of translations in a target language; for each translation in the set of translations, extracting values of features for each of a finite set of features, at least one of the features comprising a domain-specific feature which relies on a domain-specific corpus; and obtaining a measure of retrieval performance for each translation based on annotations of documents retrieved from a domain-specific corpus with the translation, the document annotations being based on a relevance of each document to original queries in the set of original queries; learning feature weights for each of the features based on the extracted values of the features and the respective measure of retrieval performance of each translation; and storing the feature weights for use in translating a new query, different from each of the original queries, from the source language to the target language, whereby candidate translations of the new query are ranked based on their respective extracted values of features and the stored feature weights.
-
Specification