Mining translations of web queries from web click-through data
First Claim
Patent Images
1. A method performed on a computing device and for generating a query translation pair, the method comprising:
- establishing a plurality of seed query pairs that each comprise a source language query and a target language query that have the same meaning but are in different languages, where the source language query is different for each of the plurality of seed query pairs;
finding a plurality of URL pairs from click-through data, the plurality of URL pairs based on the seed query pair;
extracting URL pair patterns from ones of the plurality of URL pairs having a similarity score above a particular threshold;
extracting from the click-through data bilingual URL pairs identified by the URL pair patterns;
identifying in the click-through data candidate query pairs corresponding to the bilingual URL pairs; and
filtering the candidate query pairs based on a confidence score resulting in a query translation pair where queries of the query translation pair are translations of each other.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and technologies providing translations of web queries based on an analysis of user behavior in click-through data. These methods and technologies generates large-scale and timely query translation pairs guided by a small set of seed word pairs from a dictionary, without relying on additional knowledge or complex models.
48 Citations
20 Claims
-
1. A method performed on a computing device and for generating a query translation pair, the method comprising:
-
establishing a plurality of seed query pairs that each comprise a source language query and a target language query that have the same meaning but are in different languages, where the source language query is different for each of the plurality of seed query pairs; finding a plurality of URL pairs from click-through data, the plurality of URL pairs based on the seed query pair; extracting URL pair patterns from ones of the plurality of URL pairs having a similarity score above a particular threshold; extracting from the click-through data bilingual URL pairs identified by the URL pair patterns; identifying in the click-through data candidate query pairs corresponding to the bilingual URL pairs; and filtering the candidate query pairs based on a confidence score resulting in a query translation pair where queries of the query translation pair are translations of each other. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. At least one computer storage device storing computer-executable instructions that, when executed by a computing device, cause the computing device to perform actions for generating a query translation pair, the actions comprising:
-
establishing a plurality of seed query pairs that each comprise a source language query and a target language query that have the same meaning but are in different languages, where the source language query is different for each of the plurality of seed query pairs; finding a plurality of URL pairs from click-through data, the plurality of URL pairs based on the seed query pair; extracting URL pair patterns from ones of the plurality of URL pairs having a similarity score above a particular threshold; extracting from the click-through data bilingual URL pairs identified by the URL pair patterns; identifying in the click-through data candidate query pairs corresponding to the bilingual URL pairs; and filtering the candidate query pairs based on a confidence score resulting in a query translation pair where queries of the query translation pair are translations of each other. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computing device and at least one software module together configured for generating a query translation pair based on:
-
establishing a plurality of seed query pairs that each comprise a source language query and a target language query that have the same meaning but are in different languages, where the source language query is different for each of the plurality of seed query pairs; finding a plurality of URL pairs from click-through data, the plurality of URL pairs based on the seed query pair; extracting URL pair patterns from ones of the plurality of URL pairs having a similarity score above a particular threshold; extracting from the click-through data bilingual URL pairs identified by the URL pair patterns; identifying in the click-through data candidate query pairs corresponding to the bilingual URL pairs; and filtering the candidate query pairs based on a confidence score resulting in a query translation pair where queries of the query translation pair are translations of each other. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification