IDENTIFICATION OF SIMILAR QUERIES BASED ON OVERALL AND PARTIAL SIMILARITY OF TIME SERIES
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for identifying similar queries based on their overall similarity and partial similarity of time series of frequencies of the queries are provided. To identify queries that are similar to a target query, the query analysis system generates, for each query, an overall similarity score for that query and the target query based on the time series of the query and the target query. The query analysis system also generates, for each query, partial similarity scores for the query and the target query based on various time sub-series of the overall time series of the queries. The query analysis system then identifies queries as being similar to the target query based on the overall similarity scores and the partial similarity scores of the queries.
104 Citations
20 Claims
-
I_We. I/We claim:
-
1. A method in a computing device for identifying queries that are similar to a target query, the method comprising:
-
storing frequencies of the queries representing a time series for each query; for each of a plurality of queries, calculating an overall similarity score between the query and the target query based on the frequencies of the time series; for each of a plurality of time sub-series, calculating a partial similarity score between the query and the target query for the time sub-series; and identifying queries as being similar to the target query based on the overall similarity scores and partial similarity scores of the queries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-readable medium encoded with instructions for controlling a computing device to identify queries that are similar to a target query, by a method comprising:
-
for each of a plurality of queries, calculating an overall similarity score between the query and the target query based on analysis of time series of frequencies for the queries; and for each of a plurality of time sub-series of the time series, calculating a partial similarity score between the query and the target query based on analysis of frequencies for the time sub-series; selecting queries with the highest overall similarity scores; for each of the time sub-series, selecting queries with the highest partial similarity scores for that time sub-series; and identifying the selected queries as being similar to the target query based on the overall similarity score and partial similarity scores of the selected queries. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A computing device for identifying queries that are similar to a target query, comprising:
-
a query log store having, for each query, a time series of frequencies for the query; a preprocess query store having, for each query, a representation of the time series of the query with a reduced dimensionality and an indication of time sub-series of the query that have frequency peaks; a preprocess queries component that generates the representations with reduced dimensionality and identifies the time sub-series with frequency peaks; and an identify similar queries component that identifies queries similar to the target query based on overall similarity of the queries to the target query and partial similarity of the queries to the target query, the partial similarity being based on similarity during time sub-series. - View Dependent Claims (19, 20)
-
Specification