×

Intra-language statistical machine translation

  • US 8,615,388 B2
  • Filed: 03/28/2008
  • Issued: 12/24/2013
  • Est. Priority Date: 03/28/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for intra-language machine translation of phrases in a human language, the method performed by one or more computers comprised of one or more processors and memory, the method comprising:

  • receiving training data, the training data comprising a list of text queries in the human language submitted to a search engine and a list of text sentences in the human language returned by the search engine when the text queries were submitted to the search engine, and forming pairings of source phrases and target phrases by comparing the text queries and the text sentences to identify text queries that are similar to text sentences and pairing the identified text queries, as the source phrases, with the respective identified text sentences, as the target phrases, and storing the training data in the memory;

    processing each pairing by the one or more processors, the processing comprising, for a current pairing being processed, parsing the source phrase of the current pairing into source words and parsing the target phrase of the current pairing into target words, and computing an alignment of the source words and the target words;

    using, by the one or more processors, the pairs of training data and their respective alignments to train an n-gram based intra-language statistical machine translation model, where the intra-language statistical machine translation model, when given an input phrase of text in the human language, can identify possible translations of the input phrase in the human language and compute probabilities of semantic equivalence of the input phrase to the possible translations of the input phrase in the human language; and

    using the statistical machine translation model to find translations of queries and use the translations to evaluate listings that match the queries, where the queries comprise text strings in the human language submitted to the search engine, where the listing strings comprise text strings of formal names of real world entities that are found by the search engine as matches for the query strings.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×