Query language identification
First Claim
Patent Images
1. A method comprising:
- providing, by a system comprising one or more computers, a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information;
maintaining a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language;
classifying each past query in the collection of query records based at least on;
(i) the interface language of the user interface through which the past query was received, and (ii) at least one of;
(a) a natural language of the one or more result documents associated with the past query, or (b) the natural language of one or more result documents that were selected;
generating an initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces based on the classifying, wherein the initial distribution indicates, for each user interface of the plurality of user interfaces and for each of multiple natural languages, what proportion of the past queries from the plurality of query records were in the language for the interface;
generating, based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, an interface language classifier that is trained to predict, for a given user interface and a given language, a proportion of queries that are received through the given user interface that are likely in the given language;
receiving, in the system, through a first user interface of the plurality of user interfaces, a search query comprising one or more query terms;
using the interface language classifier, that was generated based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, to determine a likelihood that the search query is in a particular natural language of the multiple natural languages, given that the first user interface is the user interface that received the query; and
providing one or more results responsive to the search query received through the first user interface, wherein the one or more results comprises results in a most likely natural language of the search query, which is automatically determined by the interface language classifier from the multiple natural languages.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer program products, for identifying the language of a search query. In one embodiment, the language of each term of a query is determined from the query terms and the language of the user interface a user used to enter the query. In another embodiment, an automatic interface language classifier is generated from a collection of past queries each submitted by a user. In some embodiments, a score is determined for each of multiple languages, each score indicating a likelihood that the query language is the corresponding one of the multiple languages.
-
Citations
18 Claims
-
1. A method comprising:
-
providing, by a system comprising one or more computers, a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information; maintaining a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language; classifying each past query in the collection of query records based at least on;
(i) the interface language of the user interface through which the past query was received, and (ii) at least one of;
(a) a natural language of the one or more result documents associated with the past query, or (b) the natural language of one or more result documents that were selected;generating an initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces based on the classifying, wherein the initial distribution indicates, for each user interface of the plurality of user interfaces and for each of multiple natural languages, what proportion of the past queries from the plurality of query records were in the language for the interface; generating, based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, an interface language classifier that is trained to predict, for a given user interface and a given language, a proportion of queries that are received through the given user interface that are likely in the given language; receiving, in the system, through a first user interface of the plurality of user interfaces, a search query comprising one or more query terms; using the interface language classifier, that was generated based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, to determine a likelihood that the search query is in a particular natural language of the multiple natural languages, given that the first user interface is the user interface that received the query; and providing one or more results responsive to the search query received through the first user interface, wherein the one or more results comprises results in a most likely natural language of the search query, which is automatically determined by the interface language classifier from the multiple natural languages. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
providing a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information; maintaining a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language; classifying each past query in the collection of query records based at least on;
(i) the interface language of the user interface through which the past query was received, and (ii) at least one of;
(a) a natural language of the one or more result documents associated with the past query, or (b) the natural language of one or more result documents that were selected;generating an initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces based on the classifying, wherein the initial distribution indicates, for each user interface of the plurality of user interfaces and for each of multiple natural languages, what proportion of the past queries from the plurality of query records were in the language for the interface; generating, based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, an interface language classifier that is trained to predict, for a given user interface and a given language, a proportion of queries that are received through the given user interface that are likely in the given language; receiving, in the system, through a first user interface of the plurality of user interfaces, a search query comprising one or more query terms; using the interface language classifier, that was generated based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, to determine a likelihood that the search query is in a particular natural language of the multiple natural languages, given that the first user interface is the user interface that received the query; and providing one or more results responsive to the search query received through the first user interface, wherein the one or more results comprises results in a most likely natural language of the search query, which is automatically determined by the interface language classifier from the multiple natural languages. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; providing a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information; maintaining a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language; classifying each past query in the collection of query records based at least on;
(i) the interface language of the user interface through which the past query was received, and (ii) at least one of;
(a) a natural language of the one or more result documents associated with the past query, or (b) the natural language of one or more result documents that were selected;generating an initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces based on the classifying, wherein the initial distribution indicates, for each user interface of the plurality of user interfaces and for each of multiple natural languages, what proportion of the past queries from the plurality of query records were in the language for the interface; generating, based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, an interface language classifier that is trained to predict, for a given user interface and a given language, a proportion of queries that are received through the given user interface that are likely in the given language; receiving, in the system, through a first user interface of the plurality of user interfaces, a search query comprising one or more query terms; using the interface language classifier, that was generated based at least on the initial distribution of languages associated with the past queries for each user interface of the plurality of user interfaces, to determine a likelihood that the search query is in a particular natural language of the multiple natural languages, given that the first user interface is the user interface that received the query; and providing one or more results responsive to the search query received through the first user interface, wherein the one or more results comprises results in a most likely natural language of the search query, which is automatically determined by the interface language classifier from the multiple natural languages. - View Dependent Claims (14, 15, 16, 17, 18)
Specification