Machine translation for query expansion
First Claim
1. A computer-implemented method comprising:
- identifying a plurality of documents having one or more questions and, for each question, a corresponding answer;
generating a plurality of question-answer pairs from the questions and respective corresponding answers occurring in the plurality of documents;
training a statistical machine translation model using the plurality of question-answer pairs, including using each question of each question-answer pair as a source language input and a corresponding answer of the question-answer pair as a target language input, wherein each question and each corresponding answer are in the same natural language;
translating, using the statistical machine translation model trained on the plurality of question-answer pairs, a phrase into one or more corresponding translated phrases; and
determining one or more synonym pairs by comparing the phrase with the one or more corresponding translated phrases.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems and apparatus, including computer program products, for expanding search queries. One method includes receiving a search query, selecting a synonym of a term in the search query based on a context of occurrence of the term in the received search query, the synonym having been derived from statistical machine translation of the term, and expanding the received search query with the synonym and using the expanded search query to search a collection of documents. Alternatively, another method includes receiving a request to search a corpus of documents, the request specifying a search query, using statistical machine translation to translate the specified search query into an expanded search query, the specified search query and the expanded search query being in the same natural language, and in response to the request, using the expanded search query to search a collection of documents.
-
Citations
17 Claims
-
1. A computer-implemented method comprising:
-
identifying a plurality of documents having one or more questions and, for each question, a corresponding answer; generating a plurality of question-answer pairs from the questions and respective corresponding answers occurring in the plurality of documents; training a statistical machine translation model using the plurality of question-answer pairs, including using each question of each question-answer pair as a source language input and a corresponding answer of the question-answer pair as a target language input, wherein each question and each corresponding answer are in the same natural language; translating, using the statistical machine translation model trained on the plurality of question-answer pairs, a phrase into one or more corresponding translated phrases; and determining one or more synonym pairs by comparing the phrase with the one or more corresponding translated phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method comprising:
-
identifying a plurality of queries in a query log, and for each query, one or more search results associated with the query, wherein each search result identifies a corresponding resource and comprises a search result snippet that includes text from the corresponding resource identified by the search result; generating a plurality of query-snippet pairs, wherein each pair associates a respective query of the plurality of queries with a particular search result snippet from one of the search results associated with the query in the query log; training a statistical machine translation model using the plurality of query-snippet pairs, including using each query of each query-snippet pair as a source language input, and a corresponding snippet of the query-snippet pair as a target language input, wherein each query and the search result snippet of each query-snippet pair are in the same natural language; translating, using the statistical machine translation model trained on the plurality of query-snippet pairs and search result snippet pairs, a phrase into one or more corresponding translated phrases; and determining one or more synonym pairs including comparing the phrase with the one or more corresponding translated phrases. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; identifying a plurality of documents having one or more questions and, for each question, a corresponding answer; generating a plurality of question-answer pairs from the questions and respective corresponding answers occurring in the plurality of documents; training a statistical machine translation model using the plurality of question-answer pairs, including using each question of each question-answer pair as a source language input and a corresponding answer of the question-answer pair as a target language input, wherein each question and each corresponding answer are in the same natural language; translating, using the statistical machine translation model trained on the plurality of question-answer pairs, a phrase into one or more corresponding translated phrases; and determining one or more synonym pairs by comparing the phrase with the one or more corresponding translated phrases. - View Dependent Claims (15)
-
-
16. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; identifying a plurality of queries in a query log, and for each query, one or more search results associated with the query, wherein each search result identifies a corresponding resource and comprises a search result snippet that includes text from the corresponding resource identified by the search result; generating a plurality of query-snippet pairs, wherein each pair associates a respective query of the plurality of queries with a particular search result snippet from one of the search results associated with the query in the query log; training a statistical machine translation model using the plurality of query-snippet pairs, including using each query of each query-snippet pair as a source language input, and a corresponding snippet of the query-snippet pair as a target language input, wherein each query and the search result snippet of each query-snippet pair are in the same natural language; translating, using the statistical machine translation model trained on the plurality of query-snippet pairs, a phrase into one or more corresponding translated phrases; and determining one or more synonym pairs including comparing the phrase with the one or more corresponding translated phrases. - View Dependent Claims (17)
-
Specification