IDENTIFICATION OF SIMILAR QUERIES BASED ON OVERALL AND PARTIAL SIMILARITY OF TIME SERIES

US 20090006365A1
Filed: 06/28/2007
Published: 01/01/2009
Est. Priority Date: 06/28/2007
Status: Active Grant

First Claim

Patent Images

I_We. I/We claim:

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for identifying similar queries based on their overall similarity and partial similarity of time series of frequencies of the queries are provided. To identify queries that are similar to a target query, the query analysis system generates, for each query, an overall similarity score for that query and the target query based on the time series of the query and the target query. The query analysis system also generates, for each query, partial similarity scores for the query and the target query based on various time sub-series of the overall time series of the queries. The query analysis system then identifies queries as being similar to the target query based on the overall similarity scores and the partial similarity scores of the queries.

104 Citations

View as Search Results

20 Claims

I_We. I/We claim:

1. A method in a computing device for identifying queries that are similar to a target query, the method comprising:
- storing frequencies of the queries representing a time series for each query;
  
  for each of a plurality of queries,calculating an overall similarity score between the query and the target query based on the frequencies of the time series;
  
  for each of a plurality of time sub-series, calculating a partial similarity score between the query and the target query for the time sub-series; and
  
  identifying queries as being similar to the target query based on the overall similarity scores and partial similarity scores of the queries.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein identifying includes selecting queries with the highest overall similarity scores or the highest partial similarity scores as being similar.
  - 3. The method of claim 2 including normalizing the overall similarity scores and the partial similarity scores.
  - 4. The method of claim 1 wherein a partial similarity score for a query and the target query are calculated only for time sub-series in which both the query and the target query have a frequency peak.
  - 5. The method of claim 1 wherein the partial similarity score for a time sub-series in which either the query or the target query or both do not have a frequency peak is set to a minimum frequency score.
  - 6. The method of claim 1 wherein the similarity scores are based on a cosine similarity of the frequencies.
  - 7. The method of claim 1 including generating a representation with a reduced dimensionality of the frequencies of the queries and wherein the similarity scores are calculated from the generated representations.
  - 8. The method of claim 7 wherein the generating of a representation includes applying a Haar Wavelet Transform.
  - 9. The method of claim 1 wherein the overall similarity score and the partial similarity scores for a query are combined to generate a combined similarity score for the query.

10. A computer-readable medium encoded with instructions for controlling a computing device to identify queries that are similar to a target query, by a method comprising:
- for each of a plurality of queries,calculating an overall similarity score between the query and the target query based on analysis of time series of frequencies for the queries; and
  
  for each of a plurality of time sub-series of the time series, calculating a partial similarity score between the query and the target query based on analysis of frequencies for the time sub-series;
  
  selecting queries with the highest overall similarity scores;
  
  for each of the time sub-series, selecting queries with the highest partial similarity scores for that time sub-series; and
  
  identifying the selected queries as being similar to the target query based on the overall similarity score and partial similarity scores of the selected queries.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computer-readable medium of claim 10 including normalizing the overall similarity scores and the partial similarity scores.
  - 12. The computer-readable medium of claim 10 wherein a partial similarity score for a query and the target query are calculated only for time sub-series in which both the query and the target query have a frequency peak.
  - 13. The computer-readable medium of claim 12 wherein a frequency peak occurs when a frequency during the time sub-series is larger than a mean frequency by one or more standard deviations.
  - 14. The computer-readable medium of claim 10 wherein the partial similarity score for a time sub-series in which either the query or the target query or both do not have a frequency peak is set to a minimum frequency score.
  - 15. The computer-readable medium of claim 10 wherein the similarity scores are based on a cosine similarity of the frequencies.
  - 16. The computer-readable medium of claim 10 including generating a representation with a reduced dimensionality of the frequencies of the queries and wherein the similarity scores are calculated from the generated representations.
  - 17. The computer-readable medium of claim 10 including selecting keywords for advertisement placement based on the identified similar queries.

18. A computing device for identifying queries that are similar to a target query, comprising:
- a query log store having, for each query, a time series of frequencies for the query;
  
  a preprocess query store having, for each query, a representation of the time series of the query with a reduced dimensionality and an indication of time sub-series of the query that have frequency peaks;
  
  a preprocess queries component that generates the representations with reduced dimensionality and identifies the time sub-series with frequency peaks; and
  
  an identify similar queries component that identifies queries similar to the target query based on overall similarity of the queries to the target query and partial similarity of the queries to the target query, the partial similarity being based on similarity during time sub-series.
- View Dependent Claims (19, 20)
- - 19. The computing device of claim 18 wherein the identify similar queries component selects as similar queries those queries with the highest overall or partial similarity scores.
  - 20. The computing device of claim 18 wherein the identify similar queries component calculates partial similarity scores only for time sub-series in which the query and the target query both have frequency peaks.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Liu, Ning, Zhang, Benyu, Wang, Jian, Yan, Jun, Chen, Zheng

Granted Patent

US 8,290,921 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/3322 using system suggestions G0...

G06F 16/951 Indexing; Web crawling tech...

IDENTIFICATION OF SIMILAR QUERIES BASED ON OVERALL AND PARTIAL SIMILARITY OF TIME SERIES

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

104 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

IDENTIFICATION OF SIMILAR QUERIES BASED ON OVERALL AND PARTIAL SIMILARITY OF TIME SERIES

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

104 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others