Query language identification
First Claim
Patent Images
1. A computer implemented method, comprising:
- providing, in a system comprising one or more computers, a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information;
maintaining a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language;
receiving, in the system, through a first user interface of the plurality of user interfaces, a search query comprising one or more query terms; and
determining, by the system, a query language of the search query from the search query, the interface language of the first user interface, and the distinct subset of query records that is associated with the first user interface, the query language being a natural language;
wherein determining the query language of the search query further comprises;
for each of multiple languages,calculating a first score for each query term and the respective language, each first score indicating the likelihood that the respective query term is in the respective language, wherein the first score is calculated based on a plurality of documents, each document having an associated natural language,calculating a second score for the respective language, the second score indicating the likelihood that the search query is in the respective language given the interface language of the first user interface through which the search query was received, where the second score is calculated based on the plurality of query records, andcalculating a third score for the respective language, the third score being a combination of the first score for the respective language and the second score for the respective language; and
determining the query language based on the third scores for the multiple languages.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer program products, for identifying the language of a search query. In one embodiment, the language of each term of a query is determined from the query terms and the language of the user interface a user used to enter the query. In another embodiment, an automatic interface language classifier is generated from a collection of past queries each submitted by a user. In some embodiments, a score is determined for each of multiple languages, each score indicating a likelihood that the query language is the corresponding one of the multiple languages.
96 Citations
15 Claims
-
1. A computer implemented method, comprising:
-
providing, in a system comprising one or more computers, a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information; maintaining a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language; receiving, in the system, through a first user interface of the plurality of user interfaces, a search query comprising one or more query terms; and determining, by the system, a query language of the search query from the search query, the interface language of the first user interface, and the distinct subset of query records that is associated with the first user interface, the query language being a natural language; wherein determining the query language of the search query further comprises; for each of multiple languages, calculating a first score for each query term and the respective language, each first score indicating the likelihood that the respective query term is in the respective language, wherein the first score is calculated based on a plurality of documents, each document having an associated natural language, calculating a second score for the respective language, the second score indicating the likelihood that the search query is in the respective language given the interface language of the first user interface through which the search query was received, where the second score is calculated based on the plurality of query records, and calculating a third score for the respective language, the third score being a combination of the first score for the respective language and the second score for the respective language; and determining the query language based on the third scores for the multiple languages. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product, encoded on a non-transitory machine-readable storage device, operable to cause data processing apparatus to perform operations comprising:
-
providing, in a system comprising one or more computers, a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information;
maintaining a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language;receiving, in the system, through a first user interface of the plurality of user interfaces, a search query comprising one or more query terms; and determining, by the system, a query language of the search query from the search query, the interface language of the first user interface, and the distinct subset of query records that is associated with the first user interface, the query language being a natural language; wherein determining the query language of the search query further comprises; for each of multiple languages, calculating a first score for each query term and the respective language, each first score indicating the likelihood that the respective query term is in the respective language, wherein the first score is calculated based on a plurality of documents, each document having an associated natural language, calculating a second score for the respective language, the second score indicating the likelihood that the search query is in the respective language given the interface language of the first user interface through which the search query was received, where the second score is calculated based on the plurality of query records, and calculating a third score for the respective language, the third score being a combination of the first score for the respective language and the second score for the respective language; and determining the query language based on the third scores for the multiple languages. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system comprising:
-
a plurality of user interfaces through which search queries are received, wherein each user interface is in a respective interface language, and wherein each interface language is a natural language in which a respective user interface presents information; a collection of query records, wherein the collection of query records includes distinct subsets of query records, wherein each distinct subset of query records is associated with a respective user interface of the plurality of user interfaces, wherein each query record associates a past query with one or more result documents, and wherein each result document has an associated natural language; one or more computers configured to perform operations comprising; receiving, through the first user interface, a search query comprising one or more query terms; determining, by the system, a query language of the search query from the search query, the interface language of the first user interface, and the distinct subset of query records that is associated with the first user interface, the query language being a natural language; wherein determining the query language of the search query further comprises; for each of multiple languages, calculating a first score for each query term and the respective language, each first score indicating the likelihood that the respective query term is in the respective language, where the first score is calculated based on a plurality of documents, each document having an associated natural language, calculating a second score for the respective language, the second score indicating the likelihood that the search query is in the respective language given the interface language of the first user interface through which the search query was received, where the second score is calculated based on the plurality of query records, and calculating a third score for the respective language, the third score being a combination of the first score for the respective language and the second score for the respective language; and determining the query language based on the third scores for the multiple languages. - View Dependent Claims (12, 13, 14, 15)
-
Specification