×

Detecting query-specific duplicate documents

  • US 8,452,766 B1
  • Filed: 07/02/2012
  • Issued: 05/28/2013
  • Est. Priority Date: 02/22/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented method comprising:

  • receiving a plurality of search results responsive to a query, wherein the query includes one or more words, wherein each search result identifies a respective document that comprises a plurality of segments, wherein each segment is a distinct sequence of consecutive characters in the respective document;

    identifying the plurality of segments for each document by sliding a fixed-length window over content of the document, wherein the content of the document encompassed by the window starting at a particular position in the document defines a particular segment in the document, wherein sliding the fixed-length window over the content of the document comprises skipping space characters and characters that would result in a last character of the window splitting a word;

    for each of the plurality of segments in each of the documents, determining a respective count of occurrences of the one or more words of the query that occur in the segment;

    for each of the documents, ranking the segments of the document based on the respective counts of the segments; and

    identifying one or more highest ranked segments for each of the documents as a query-relevant part of the document.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×