×

Method and system for document similarity analysis based on common denominator similarity

  • US 10,248,626 B1
  • Filed: 09/29/2016
  • Issued: 04/02/2019
  • Est. Priority Date: 09/29/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method for document similarity analysis, the method comprising:

  • obtaining a document to be archived;

    identifying a document category similar to the document to be archived, based on indexing terms and corresponding term frequencies, comprising;

    identifying a document category that includes a plurality of indexing terms that are identical to indexing terms identified in the document to be archived;

    obtaining a term frequency vector for the identical indexing terms in the document to be archived;

    generating a normalized term frequency vector, from the term frequency vector for the document to be archived;

    obtaining a term frequency vector for the identical indexing terms in the identified document category;

    generating a normalized term frequency vector, from the term frequency vector for the identified document category;

    calculating a common denominator similarity based on the normalized term frequency vector for the document to be archived, the normalized term frequency vector for the identified document category, and a common denominator;

    making a determination that the document category is similar to the document to be archived based on the common denominator similarity; and

    registering the document to be archived in the document category.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×