Intra-language statistical machine translation
First Claim
1. A computer implemented method for intra-language machine translation of phrases in a human language, the method performed by one or more computers comprised of one or more processors and memory, the method comprising:
- receiving training data, the training data comprising a list of text queries in the human language submitted to a search engine and a list of text sentences in the human language returned by the search engine when the text queries were submitted to the search engine, and forming pairings of source phrases and target phrases by comparing the text queries and the text sentences to identify text queries that are similar to text sentences and pairing the identified text queries, as the source phrases, with the respective identified text sentences, as the target phrases, and storing the training data in the memory;
processing each pairing by the one or more processors, the processing comprising, for a current pairing being processed, parsing the source phrase of the current pairing into source words and parsing the target phrase of the current pairing into target words, and computing an alignment of the source words and the target words;
using, by the one or more processors, the pairs of training data and their respective alignments to train an n-gram based intra-language statistical machine translation model, where the intra-language statistical machine translation model, when given an input phrase of text in the human language, can identify possible translations of the input phrase in the human language and compute probabilities of semantic equivalence of the input phrase to the possible translations of the input phrase in the human language; and
using the statistical machine translation model to find translations of queries and use the translations to evaluate listings that match the queries, where the queries comprise text strings in the human language submitted to the search engine, where the listing strings comprise text strings of formal names of real world entities that are found by the search engine as matches for the query strings.
2 Assignments
0 Petitions
Accused Products
Abstract
Training data may be provided, the training data including pairs of source phrases and target phrases. The pairs may be used to train an intra-language statistical machine translation model, where the intra-language statistical machine translation model, when given an input phrase of text in the human language, can compute probabilities of semantic equivalence of the input phrase to possible translations of the input phrase in the human language. The statistical machine translation model may be used to translate between queries and listings. The queries may be text strings in the human language submitted to a search engine. The listing strings may be text strings of formal names of real world entities that are to be searched by the search engine to find matches for the query strings.
-
Citations
19 Claims
-
1. A computer implemented method for intra-language machine translation of phrases in a human language, the method performed by one or more computers comprised of one or more processors and memory, the method comprising:
-
receiving training data, the training data comprising a list of text queries in the human language submitted to a search engine and a list of text sentences in the human language returned by the search engine when the text queries were submitted to the search engine, and forming pairings of source phrases and target phrases by comparing the text queries and the text sentences to identify text queries that are similar to text sentences and pairing the identified text queries, as the source phrases, with the respective identified text sentences, as the target phrases, and storing the training data in the memory; processing each pairing by the one or more processors, the processing comprising, for a current pairing being processed, parsing the source phrase of the current pairing into source words and parsing the target phrase of the current pairing into target words, and computing an alignment of the source words and the target words; using, by the one or more processors, the pairs of training data and their respective alignments to train an n-gram based intra-language statistical machine translation model, where the intra-language statistical machine translation model, when given an input phrase of text in the human language, can identify possible translations of the input phrase in the human language and compute probabilities of semantic equivalence of the input phrase to the possible translations of the input phrase in the human language; and using the statistical machine translation model to find translations of queries and use the translations to evaluate listings that match the queries, where the queries comprise text strings in the human language submitted to the search engine, where the listing strings comprise text strings of formal names of real world entities that are found by the search engine as matches for the query strings. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more storage devices storing information to enable a computing device to perform a process for translating phrases of a human language to other phrases of the language, the process comprising:
-
accessing training pairs comprising pairs of phrases in the human language, the training pairs comprising text queries in the human language submitted to a search engine and corresponding text sentences in the human language returned by the search engine when the text queries were submitted to the search engine, the training pairs formed by comparing the text queries and the text sentences to identify text queries that are similar to text sentences and pairing the identified text queries with the respective identified text sentences; training a statistical machine translation model with the training pairs by computing respective alignments of the training pairs, an alignment mapping words of a phrase with the words the phrase is paired to by inserting null words into the phrase or reordering words of the phrase, the statistical machine translation model being capable of computing probabilities that a target string in the human language is a valid translation of a given source string in the human language; receiving a text phrase in the human language, decoding the text phrase to different candidate translations of the text phrase in the human language, and using the statistical machine translation model to compute probabilities that the candidate translations are translations of the text phrase; and based on the probabilities, storing and/or displaying, by computer, one or more of the candidate translations. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A method performed by one or more computers comprised of one or more processors and physical storage, the method comprising:
-
providing a statistical machine translation model, stored in the physical storage, and configured to allow the one or more processors to compute probabilities of translations of phrases, wherein the phrases are in a human language and the translations of the phrases are in the same human language, the statistical machine translation model having been trained with training pairs, the training pairs having been computed from a list of text queries in the human language submitted to a search engine and from a list of text sentences in the human language returned by the search engine when the text queries were submitted to the search engine, and by comparing the text queries to the text sentences to identify which of the text queries are similar to which of the text sentences, where text sentences identified as similar to text queries are respectively paired to form the training pairs; and using, by the processor, the statistical machine translation model to translate between query forms and listing forms of organizations and/or businesses, where the query forms comprise phrases, in the human language, submitted to the search engine, and where the listing forms comprise formal names, in the human language, of organizations and/or businesses searchable by the search engine. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification