×

Method and apparatus for detecting and summarizing document similarity within large document sets

  • US 6,240,409 B1
  • Filed: 07/31/1998
  • Issued: 05/29/2001
  • Est. Priority Date: 07/31/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of comparing a query file to one or more stored files, the method comprising:

  • receiving a query file having a plurality of query file substrings;

    selecting a first query file substring from the plurality of query file substrings;

    preprocessing the first query file substring thereby making the substring more suitable for searching in the storage area;

    searching a storage area storing a plurality of ordered file substrings for the first query file substring;

    storing match data relating to a match between the first query file substring and a first ordered file substring; and

    joining the first ordered file substring and a second ordered file substring if the first ordered file substring and the second ordered file substring are in a particular sequence and joining the first query file substring and a second query file substring if the first query file substring and the second query file substring are in the same particular sequences wherein the second ordered file substring and the second query file substring match, thereby forming a third coalesced ordered file substring and a third coalesced query file substring that can be used to format output comparison data.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×