×

Information retrieval systems with duplicate document detection and presentation functions

  • US 7,809,695 B2
  • Filed: 05/05/2005
  • Issued: 10/05/2010
  • Est. Priority Date: 08/23/2004
  • Status: Active Grant
First Claim
Patent Images

1. An information-retrieval system comprising:

  • a plurality of databases; and

    one or more servers for facilitating client access to the plurality of databases over a network, with the one or more servers collectively comprising;

    signature-generation means for generating a plurality of document signatures, with each document signature based on a plurality of features from and their respective positions in a corresponding document in one or more of the databases, the signature-generation means comprising means for forming a document signature based on one or more of the group consisting of a document hash value and a document feature vector, wherein the hash value is based on features and positions of the features within a document;

    query-definition means for defining a query and directing identification of search-result documents that include content duplicative of one or more other search-result documents;

    duplicate-determination means for determining, based on a subset of the document signatures, whether one or more documents within results of the query include content duplicative of content in one or more other documents within the results;

    means for controlling display of results of the query with at least one of the displayed results indicated as including content duplicative of content in one or more other documents within the results; and

    means for controlling output of results of the query to a printer or email transmission device, based on user selected options related to output of documents that include content duplicative of content of one or more other documents within the results.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×