System for finding queries aiming at tail URLs
First Claim
1. A system for classifying a query in relation to one or more indexes to which the query is to be directed, comprising:
- one or more computer memories that store data relating to at least one index; and
at least one processor coupled to the one or more computer memories, the at least one processor being configured to act as;
a feature generation component that generates one or more features corresponding to the query;
a model building component that builds a prediction model for respective queries by utilizing a machine learning algorithm and a set of training data; and
a predicting component that;
analyzes the one or more features corresponding to the query and the prediction model, andpredicts, based at least on the analyzing, whether the query is directed to a resource pre-designated as a commonly queried resource or a resource pre-designated as an uncommonly queried resource.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methodologies for improved query classification and processing are provided herein. As described herein, a query prediction model can be constructed from a set of training data (e.g., diagnostic data obtained from an automatic diagnostic system and/or other suitable data) using a machine learning-based technique. Subsequently upon receiving a query, a set of features corresponding to the query, such as the length and/or frequency of the query, unigram probabilities of respective words and/or groups of words in the query, presence of pre-designated words or phrases in the query, or the like, can be generated. The generated features can then be analyzed in combination with the query prediction model to classify the query by predicting whether the query is aimed at a head Uniform Resource Locator (URL) or a tail URL. Based on this prediction, an appropriate index or combination of indexes can be assigned to answer the query.
27 Citations
20 Claims
-
1. A system for classifying a query in relation to one or more indexes to which the query is to be directed, comprising:
-
one or more computer memories that store data relating to at least one index; and at least one processor coupled to the one or more computer memories, the at least one processor being configured to act as; a feature generation component that generates one or more features corresponding to the query; a model building component that builds a prediction model for respective queries by utilizing a machine learning algorithm and a set of training data; and a predicting component that; analyzes the one or more features corresponding to the query and the prediction model, and predicts, based at least on the analyzing, whether the query is directed to a resource pre-designated as a commonly queried resource or a resource pre-designated as an uncommonly queried resource. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of classifying and answering a query, comprising:
-
employing one or more processors to perform the classifying and answering, the classifying and answering comprising; creating a prediction model for respective queries based on a set of diagnostic data using one or more machine learning algorithms, the set of diagnostic data comprising respective pairs of one query and an identity of an indexed resource to which the one query is directed; identifying the query; generating one or more features corresponding to the query; and predicting whether the query is directed to a commonly queried resource or an uncommonly queried resource by analyzing the one or more features corresponding to the query and the prediction model. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A machine-readable storage medium having stored thereon instructions that, when executed by a machine, perform acts comprising:
-
obtaining a set of positive search data from an automatic diagnostic system, the set of positive search data comprising one or more queries aimed at Uniform Resource Locators (URLs) indexed by a large search index; obtaining a set of negative search data from the automatic diagnostic system, the set of negative search data comprising one or more queries aimed at respective URLs indexed by a small search index; constructing a prediction model from the positive search data and the negative search data using a machine learning algorithm; receiving a newly-submitted query; obtaining one or more features of the newly-submitted query; predicting whether the newly-submitted query is aimed at a URL indexed by the small index or a URL indexed by the large index, the predicting being based on the features of the newly-submitted query and the prediction model; and answering the newly-submitted query using the small index upon predicting that the newly-submitted query aims at a URL indexed by the small index, or using the large index upon predicting that the newly-submitted query aims at a URL indexed by the large index.
-
Specification