Method and system for collecting related queries
First Claim
Patent Images
1. A method for collecting related queries, comprising the steps of:
- (a) obtaining a first query and a second query that have been submitted during a search for data, wherein said first query and said second query are regarded as a query pair;
(b) determining a number of occurrences of said query pair that are submitted in a plurality of searches; and
(c) using said number of occurrences to determine whether said first query and said second query have a likelihood of being submitted by a class of searcher.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for collecting related queries, comprising the steps of obtaining a first query and a second query that have been submitted during a search for data, and determining whether the first query and the second query have a likelihood of being submitted by a class of searcher.
-
Citations
35 Claims
-
1. A method for collecting related queries, comprising the steps of:
-
(a) obtaining a first query and a second query that have been submitted during a search for data, wherein said first query and said second query are regarded as a query pair;
(b) determining a number of occurrences of said query pair that are submitted in a plurality of searches; and
(c) using said number of occurrences to determine whether said first query and said second query have a likelihood of being submitted by a class of searcher. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
(1) discarding said first and second queries if a rate of query submission from said identifiable source exceeds a predetermined rate, and (2) discarding said first and second queries if said identifiable source comprises a predetermined undesired source.
-
-
11. The method of claim 8, wherein said identifiable source is identified by at least one of:
-
(1) an Internet Protocol (IP) address, (2) data from a hypertext transfer protocol (HTTP) management mechanism, (3) a cookie, and (4) an identifier assigned by a search engine.
-
-
12. The method of claim 1, wherein said obtaining step comprises:
-
obtaining records, wherein each said record comprises a query and an identifier indicating a source from which said query was submitted; and
sorting said records using said identifier as a key, wherein said first and second queries are obtained from records from a selected source.
-
-
13. The method of claim 12,
wherein each said record further comprises a time at which said query was submitted, wherein said sorting step further comprises using said time as a secondary key, wherein said method further comprises determining a set of queries from said selected source, and wherein a difference in time between successive queries in said set is less than a predetermined value. -
14. The method of claim 1, wherein said first query and said second query are obtained from a search engine access log.
-
15. The method of claim 1, wherein said first query and said second query are regarded as a query pair, and wherein said determining step comprises:
-
determining a number of occurrences of said query pair that are submitted in a plurality of searches; and
using said number of occurrences to determine whether said first query and said second query have said likelihood of being submitted by said class of searcher.
-
-
16. The method of claim 1, wherein said using step comprises finding that said first query and said second query have said likelihood of being submitted by said class of searcher if said number of occurrences is greater than a predetermined number.
-
17. The method of claim 1, wherein said using step comprises evaluating a ratio between said number of occurrences and an expected number of occurrences.
-
18. The method of claim 17, wherein said evaluating step applies a technique selected from the group consisting of a mutual information analysis and a chi-squared test.
-
19. The method of claim 1, wherein said first query and said second query are regarded as a first unique query pair, wherein said first query is the first query in N unique query pairs that have been submitted in a plurality of searches, and wherein said step (c) comprises:
-
(c1) determining a number of occurrences of each of said N unique query pairs submitted in said plurality of searches; and
(c2) using a number of occurrences of said first unique query pair and said number of occurrences of each of said N unique query pairs to determine whether said first query and said second query have said likelihood of being submitted by said class of searcher.
-
-
20. The method of claim 19,
wherein each of said N unique query pairs has a mutual information value, wherein each of said N unique query pairs has a rank ordered according to said mutual information value, wherein a greatest rank corresponds to a greatest mutual information value, and wherein said step (c2) comprises finding that said first query and said second query have said likelihood of being submitted by said class of searcher if said rank of said first unique query pair is greater than a predetermined rank. -
21. The method of claim 1, wherein said method further comprises processing a query in accordance with at least one step selected from the group consisting of:
-
(1) converting a non-printable character to a space in said query, (2) replacing consecutive spaces with a single space in said query, (3) removing a quotation mark from said query, (4) converting an alpha character to its uppercase representation in said query, (5) discarding said query in a case where said query comprises a number of characters that is not within a predetermined range of numbers, (6) discarding said query in a case where said query comprises a uniform resource locator, and (7) discarding said query in a case where said query includes a term from the group consisting of a pornographic term, a violent term, a hateful term, an ethnically derogatory term, or a predetermined objectionable term.
-
-
22. The method of claim 1, wherein said first query is not identical to said second query.
-
23. The method of claim 1, wherein said first query is not a plural of said second query.
-
24. The method of claim 1, further comprising:
-
receiving a communication indicating that a searcher has submitted one of said first query and said second query; and
sending the other of said first query and said second query to said searcher.
-
-
25. The method of claim 1, further comprising using either said first query and/or said second query to aid in a selection of an advertisement.
-
26. The method of claim 1, further comprising using said first query to enhance a search related to said second query, and/or vise versa.
-
27. The method of claim 1, further comprising:
-
presenting said second query to a searcher that has submitted said first query;
determining whether said searcher thereafter submits said second query; and
determining whether said searcher thereafter utilizes information presented to said searcher if said searcher submits said second query.
-
-
28. The method of claim 27, wherein said information comprises at least one of a search result or an advertisement.
-
29. The method of claim 27, wherein said searcher is one of a plurality of searchers that have submitted said first query and thereafter submitted said second query, and wherein said method further comprises:
determining a usage level, by said plurality of searchers, of information that is thereafter presented to said plurality of searchers.
-
30. The method of claim 29, wherein said step of determining said usage level comprises determining a ratio between a number of times said plurality of searchers further pursue said information and a number of times said second query is presented to said plurality of searchers.
-
31. The method of claim 30, wherein said number of times said plurality of searchers further pursue said information is found from a number of times said plurality of searchers select a link to additional information.
-
32. The method of claim 29, wherein said second query is a candidate for an alternative query to said first query, and wherein said method further comprises determining whether to retain said second query as said candidate based on said usage level.
-
33. The method of claim 32, further comprising eliminating said second query as said candidate if said usage level is less than a predetermined level.
-
34. A system for collecting related queries, comprising:
-
means for obtaining a first query and a second query that have been submitted during a search for data, wherein said first query and said second query are regarded as a query pair;
means for determining a number of occurrences of said query pair that are submitted in a plurality of searches; and
means for using said number of occurrences to determine whether said first query and said second query have a likelihood of being submitted by a class of searcher.
-
-
35. A storage media including instructions for controlling a processor that, in turn, collects related queries, said storage media comprising:
-
means for controlling said processor to obtain a first query and a second query that have been submitted during a search for data, wherein said first query and said second query are regarded as a query pair;
means for controlling said processor to determine a number of occurrences of said query pair that are submitted in a plurality of searches; and
means for controlling said processor to use said number of occurrences to determine whether said first query and said second query have a likelihood of being submitted by a class of searcher.
-
Specification