×

Representative Document Selection for a Set of Duplicate Documents

  • US 20150026170A1
  • Filed: 10/09/2014
  • Published: 01/22/2015
  • Est. Priority Date: 07/03/2003
  • Status: Abandoned Application
First Claim
Patent Images

1. A method, comprising:

  • at a computing device having one or more processors and memory;

    obtaining a plurality of documents, wherein a respective document in the plurality of documents is associated with a score and wherein each document in the plurality of documents is from a different data structure in a plurality of data structures, each data structure in the plurality of data structures representing a different portion of a document address space;

    selecting a first document in the plurality of documents in accordance with the score associated with the first document, whereinthe first document has a fingerprint that indicates that the first document has substantially identical content to every other document in the plurality of documents;

    indexing, in accordance with the score, the first document thereby producing an indexed first document; and

    with respect to the plurality of documents, including the indexed first document in a document index as representative of each document in the plurality of documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×