Detecting query-specific duplicate documents
First Claim
Patent Images
1. A computer-implemented method, comprising:
- receiving a plurality of search results responsive to a query, wherein the query includes one or more search keywords, and wherein the plurality of search results have an associated order, where the particular order is determined using a ranking criteria;
processing each search result in the plurality of search results according to the order for the plurality of search results to generate a final group of search results, the final group of search results including a plurality of final search results from the plurality of search results, the processing including,adding a first search result in the plurality of search results to the final group of search results, wherein the first search result is first in the order for the plurality of search results, andfor each other search result of the plurality of search results;
determining whether a first document corresponding to the search result is a query-specific duplicate of a second document corresponding to any of the search results in the final group of search results, andif the first document corresponding to the search result is not a query-specific duplicate of the second document corresponding to any of the remaining search results in the final group of search results, adding the search result to the final set of search results before processing any other search result following the search result in the order, and otherwise not adding the search result to the final set of search results; and
providing the final group of search results.
2 Assignments
0 Petitions
Accused Products
Abstract
An improved duplicate detection technique that uses query-relevant information to limit the portion(s) of documents to be compared for similarity is described. Before comparing two documents for similarity, the content of these documents may be condensed based on the query. In one embodiment, query-relevant information or text (also referred to as “snippets”) is extracted from the documents and only the extracted snippets, rather than the entire documents, are compared for purposes of determining similarity.
20 Citations
41 Claims
-
1. A computer-implemented method, comprising:
-
receiving a plurality of search results responsive to a query, wherein the query includes one or more search keywords, and wherein the plurality of search results have an associated order, where the particular order is determined using a ranking criteria; processing each search result in the plurality of search results according to the order for the plurality of search results to generate a final group of search results, the final group of search results including a plurality of final search results from the plurality of search results, the processing including, adding a first search result in the plurality of search results to the final group of search results, wherein the first search result is first in the order for the plurality of search results, and for each other search result of the plurality of search results; determining whether a first document corresponding to the search result is a query-specific duplicate of a second document corresponding to any of the search results in the final group of search results, and if the first document corresponding to the search result is not a query-specific duplicate of the second document corresponding to any of the remaining search results in the final group of search results, adding the search result to the final set of search results before processing any other search result following the search result in the order, and otherwise not adding the search result to the final set of search results; and providing the final group of search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 39)
-
-
11. A system, comprising:
one or more computers configured to perform operations comprising; receiving a plurality of search results responsive to a query, wherein the query includes one or more search keywords, and wherein the plurality of search results have an associated order, where the particular order is determined using a ranking criteria; processing each search result in the plurality of search results according to the order for the plurality of search results to generate a final group of search results, the final group of search results including a plurality of final search results from the plurality of search results, the processing including, adding a first search result in the plurality of search results to the final group of search results, wherein the first search result is first in the order for the plurality of search results, and for each other search result of the plurality of search results; determining whether a first document corresponding to the search result is a query-specific duplicate of a second document corresponding to any of the search results in the final group of search results, and if the first document corresponding to the search result is not a query-specific duplicate of the second document corresponding to any of the remaining search results in the final group of search results, adding the search result to the final set of search results before processing any other search result following the search result in the order, and otherwise not adding the search result to the final set of search results; and providing the final group of search results. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 40)
-
21. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
-
receiving a plurality of search results responsive to a query, wherein the query includes one or more search keywords, and wherein the plurality of search results have an associated order, where the particular order is determined using a ranking criteria; processing each search result in the plurality of search results according to the order for the plurality of search results to generate a final group of search results, the final group of search results including a plurality of final search results from the plurality of search results, the processing including, adding a first search result in the plurality of search results to the final group of search results, wherein the first search result is first in the order for the plurality of search results, and for each other search result of the plurality of search results; determining whether a first document corresponding to the search result is a query-specific duplicate of a second document corresponding to any of the search results in the final group of search results, and if the first document corresponding to the search result is not a query-specific duplicate of the second document corresponding to any of the remaining search results in the final group of search results, adding the search result to the final set of search results before processing any other search result following the search result in the order, and otherwise not adding the search result to the final set of search results; and providing the final group of search results. - View Dependent Claims (36, 37, 38, 41)
-
-
22. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
-
receiving search results in response to a query, the query including one or more keywords, the search results including a first search result and a second search result; generating a set of final search results from the received search results with one or more processors, including; adding the first search result to the set of final search results; determining that a first document corresponding to the first search result and a second document corresponding to the second search result are query-specific duplicate documents from a comparison of one or more first query-relevant parts of the first document and one or more second query-relevant parts of the second document, where each query-relevant part includes at least one of the one or more keywords; and in response to the determination, not adding the second search result to the set of final search results; and providing the set of final search results. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
Specification