SYSTEM FOR FINDING QUERIES AIMING AT TAIL URLs
First Claim
1. A system for classifying a query in relation to one or more indexes to which the query is to be directed, comprising:
- one or more computer memories that store data relating to at least one index; and
at least one processor coupled to the one or more computer memories, the at least one processor configured to act as;
a feature generation component that generates one or more features corresponding to a query;
a model building component that builds a prediction model for respective queries by utilizing a machine learning algorithm and an associated set of training data; and
a predicting component that analyzes the one or more features corresponding to the query and the prediction model to predict whether the query is directed to a resource pre-designated as a commonly queried resource or a resource pre-designated as an uncommonly queried resource.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methodologies for improved query classification and processing are provided herein. As described herein, a query prediction model can be constructed from a set of training data (e.g., diagnostic data obtained from an automatic diagnostic system and/or other suitable data) using a machine learning-based technique. Subsequently upon receiving a query, a set of features corresponding to the query, such as the length and/or frequency of the query, unigram probabilities of respective words and/or groups of words in the query, presence of pre-designated words or phrases in the query, or the like, can be generated. The generated features can then be analyzed in combination with the query prediction model to classify the query by predicting whether the query is aimed at a head Uniform Resource Locator (URL) or a tail URL. Based on this prediction, an appropriate index or combination of indexes can be assigned to answer the query.
113 Citations
20 Claims
-
1. A system for classifying a query in relation to one or more indexes to which the query is to be directed, comprising:
-
one or more computer memories that store data relating to at least one index; and at least one processor coupled to the one or more computer memories, the at least one processor configured to act as; a feature generation component that generates one or more features corresponding to a query; a model building component that builds a prediction model for respective queries by utilizing a machine learning algorithm and an associated set of training data; and a predicting component that analyzes the one or more features corresponding to the query and the prediction model to predict whether the query is directed to a resource pre-designated as a commonly queried resource or a resource pre-designated as an uncommonly queried resource. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of classifying and answering a query, comprising:
employing one or more processors to perform the classifying and answering, the classifying and answering comprising; creating a prediction model for respective queries based on a set of diagnostic data using one or more machine learning algorithms, the set of diagnostic data comprising respective pairs of a query and an identity of an indexed resource to which the query is directed; identifying a query; generating one or more features corresponding to the query; and predicting whether the query is directed to a commonly queried resource or an uncommonly queried resource by analyzing the one or more features corresponding to the query and the prediction model. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
20. A machine-readable medium having stored thereon instructions which, when executed by a machine, cause the machine to act as a query processing system comprising:
-
means for obtaining a set of positive search data from an automatic diagnostic system comprising one or more queries aimed at Uniform Resource Locators (URLs) indexed by a large search index; means for obtaining a set of negative search data from the automatic diagnostic system comprising one or more queries aimed at respective URLs indexed by a small search index; means for constructing a prediction model from the positive search data and the negative search data using a machine learning algorithm; means for receiving a newly-submitted query; means for obtaining one or more features of the newly-submitted query; means for predicting whether the newly-submitted query is aimed at a URL indexed by the small index or a URL indexed by the large index based on the features of the newly-submitted query and the prediction model; and means for answering the newly-submitted query using the small index upon predicting that the newly-submitted query aims at a URL indexed by the small index or using the small index and the large index upon predicting that the newly-submitted query aims at a URL indexed by the large index.
-
Specification