Representing queries and determining similarity based on an ARIMA model
First Claim
1. A method in a computing device for determining similarity between queries, the method comprising:
- storing frequencies of the queries during intervals, each frequency of a query for an interval representing a number of times the query was submitted by users to a search engine;
for each of the queries, generating autoregressive integrated moving average (“
ARIMA”
) coefficients for that query based on the stored frequencies for that query; and
for a pair of queries, calculating a similarity score for the queries based on a correlation between the ARIMA coefficients of the queries, the calculating including aggregating products, for each ARIMA coefficients, of a first factor of a first query of the pair and a second factor of a second query of the pair, a factor for an ARIMA coefficient of a query being a difference between the ARIMA coefficient and a mean of the ARIMA coefficients divided by a standard deviation of the ARIMA coefficients.
2 Assignments
0 Petitions
Accused Products
Abstract
Representing queries and determining similarity of queries based on an autoregressive integrated moving average (“ARIMA”) model is provided. A query analysis system represents each query by its ARIMA coefficients. The query analysis system may estimate the frequency information for a desired past or future interval based on frequency information for some initial intervals. The query analysis system may also determine the similarity of a pair of queries based on the similarity of their ARIMA coefficients. The query analysis system may use various metrics, such as a correlation metric, to determine the similarity of the ARIMA coefficients.
-
Citations
13 Claims
-
1. A method in a computing device for determining similarity between queries, the method comprising:
-
storing frequencies of the queries during intervals, each frequency of a query for an interval representing a number of times the query was submitted by users to a search engine; for each of the queries, generating autoregressive integrated moving average (“
ARIMA”
) coefficients for that query based on the stored frequencies for that query; andfor a pair of queries, calculating a similarity score for the queries based on a correlation between the ARIMA coefficients of the queries, the calculating including aggregating products, for each ARIMA coefficients, of a first factor of a first query of the pair and a second factor of a second query of the pair, a factor for an ARIMA coefficient of a query being a difference between the ARIMA coefficient and a mean of the ARIMA coefficients divided by a standard deviation of the ARIMA coefficients. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable medium encoded with instructions for controlling a computing device to determine frequency of a query at an interval, by a method comprising:
-
storing frequencies of the query during intervals; generating autoregressive integrated moving average (“
ARIMA”
) coefficients representing the query based on the stored frequencies;estimating the frequency of the query at the interval based on the ARIMA coefficients for the query; and determining similarity between a pair of queries based on the ARIMA coefficients for the queries, the determining including aggregating products, for each ARIMA coefficient, of a first factor of a first query of the pair and a second factor of a second query of the pair, a factor for an ARIMA coefficient of a query being a difference between the ARIMA coefficient and a mean of the ARIMA coefficients divided by a standard deviation of the ARIMA coefficients. - View Dependent Claims (10, 11)
-
-
12. A computing device for representing a query comprising:
-
a query frequency store having, for each of a plurality of intervals, a frequency of the query during the interval; a component that generates autoregressive integrated moving average (“
ARIMA”
) coefficients for the query based on the frequencies of the query during the intervals;an ARIMA coefficient store for storing the generated ARIMA coefficients representing the query; and a component that determines similarity between a pair of queries based on the ARIMA coefficients of the queries, the determining including aggregating products, for each ARIMA coefficient, of a first of a first query of the pair and a second factor of a second query of the pair, a factor for an ARIMA coefficient of a query being a difference between the ARIMA coefficient and a mean of the ARIMA coefficients divided by a standard deviation of the ARIMA coefficients. - View Dependent Claims (13)
-
Specification