×

Determining a document similarity metric

  • US 7,565,348 B1
  • Filed: 03/24/2006
  • Issued: 07/21/2009
  • Est. Priority Date: 03/24/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining similarity of a source document to a pattern file by a computer, comprising:

  • creating a plurality of tables storing data associated with a plurality of patterns included in a pattern file;

    determining which of the plurality of patterns in the pattern file exists in the source document, by analyzing the source document with reference to the plurality of tables;

    determining a coverage metric, a count metric, a clustering metric and a uniqueness metric responsive to determining which of the patterns exist in the source document, the coverage metric indicative of the frequency of patterns in the pattern file appearing in the source document, the count metric indicative of a count of the patterns in the pattern file existing in the source document, the clustering metric indicative of the degree of proximity between the patterns of the pattern file in the source document, the uniqueness metric indicative of the frequency of a pattern in the pattern file appearing in other pattern files; and

    determining a document similarity metric for each pattern file based on the coverage metric, the count metric, the clustering metric and the uniqueness metric, the document similarity metric indicative of the degree of similarity between the source document and the pattern file.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×