Detecting query-specific duplicate documents
First Claim
Patent Images
1. A method comprising:
- receiving search results in response to a query, the query including one or more keywords, the search results including a first search result and a second search result;
generating a set of final search results from the received search results with one or more processors, including;
adding the first search result to the set of final search results;
determining that a first document corresponding to the first search result and a second document corresponding to the second search result are query-specific duplicate documents from a comparison of one or more first query-relevant parts of the first document and one or more second query-relevant parts of the second document, where each query-relevant part includes at least one of the one or more keywords; and
in response to the determination, not adding the second search result to the set of final search results; and
presenting the set of final search results.
2 Assignments
0 Petitions
Accused Products
Abstract
An improved duplicate detection technique that uses query-relevant information to limit the portion(s) of documents to be compared for similarity is described. Before comparing two documents for similarity, the content of these documents may be condensed based on the query. In one embodiment, query-relevant information or text (also referred to as “snippets”) is extracted from the documents and only the extracted snippets, rather than the entire documents, are compared for purposes of determining similarity.
83 Citations
20 Claims
-
1. A method comprising:
-
receiving search results in response to a query, the query including one or more keywords, the search results including a first search result and a second search result; generating a set of final search results from the received search results with one or more processors, including; adding the first search result to the set of final search results; determining that a first document corresponding to the first search result and a second document corresponding to the second search result are query-specific duplicate documents from a comparison of one or more first query-relevant parts of the first document and one or more second query-relevant parts of the second document, where each query-relevant part includes at least one of the one or more keywords; and in response to the determination, not adding the second search result to the set of final search results; and presenting the set of final search results. - View Dependent Claims (2, 6, 9, 11, 12, 13, 14)
-
-
3. An apparatus comprising:
-
at least one processor; and at least one storage device storing processor executable instructions which, when executed by the at least one processor, processes search results by; receiving search results in response to a query, the query including one or more keywords, the search results including a first search result and a second search result; generating a set of final search results from the received search results, including; adding the first search result to the set of final search results; determining that a first document corresponding to the first search result and a second document corresponding to the second search result are query-specific duplicate documents from a comparison of one or more of the first query-relevant parts of the first document and one or more second query-relevant parts of the second document, where each query-relevant part includes at least one of the one or more keywords; and in response to the determination, not adding the second search result to the set of final search results; and presenting the set of final search results. - View Dependent Claims (4, 7)
-
-
5. An apparatus for processing search results, the apparatus comprising:
-
a storage device for storing search results generated in response to a query, the query including one or more keywords, and for storing at least one of the one or more keywords, where the search results include a first search result and a second search result; a final results generator for generating a set of final search results from the search results stored in the storage facility, the generating including; adding the first search result to the set of final search results; determining whether a first document corresponding to the first search result and a second document corresponding to the second search result are query-specific duplicate documents from a comparison of one or more first query-relevant parts of the first document and one or more second query-relevant parts of the second document, where each of the query-relevant parts includes at least one of the one or more keywords stored in the storage facility; and adding the second search result to the set of final search results when the similarity determination facility determines that the first document and the second document are not query-specific duplicate documents and not adding the second search result to the set of final search results when the similarity determination facility determines the first document and the second document are query-specific duplicate documents; and a final results presenter for presenting the set of final search results. - View Dependent Claims (8, 10, 15, 16, 17, 18)
-
-
19. A method comprising:
-
receiving search results that have been generated in response to a query, the query including one or more keywords, the search results including a first search result and a second search result; identifying, with one or more processors, a first document corresponding to the first search result and a second document corresponding to the second search result as query-specific duplicate documents based on a comparison of one or more first query-relevant parts of the first document and one or more second query-relevant parts of the second document, where each query-relevant part includes at least one of the one or more keywords; generating a set of final search results from the received set of search results, where the set of final search results includes the first search result but not the second search result according to the identification of the first and second documents as a query-specific duplicate documents; and presenting the set of final search results.
-
-
20. An apparatus comprising:
-
at least one processor; and at least one storage device storing processor executable instructions which, when executed by the at least one processor, causes the at least one processor to perform operations comprising; receiving search results that have been generated in response to a query, the query including one or more keywords, the search results including a first search result and a second search result; identifying a first document corresponding to the first search result and a second document corresponding to the second search result as query-specific duplicate documents based on a comparison of one or more first query-relevant parts of the first document and one or more second query-relevant parts of the second document, where each query-relevant part includes at least one of the one or more keywords; generating a set of final search results from the received set of search results, where the set of final search results includes the first search result but not the second search result according to the identification of the first and second documents as a query-specific duplicate documents; and presenting the set of final search results.
-
Specification