×

Document classification and characterization using human judgment, tiered similarity analysis and language/concept analysis

  • US 10,467,252 B1
  • Filed: 01/30/2013
  • Issued: 11/05/2019
  • Est. Priority Date: 01/30/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving a corpus of documents;

    characterizing similarities among the corpus of documents using at least three similarity algorithms having different similarity criteria, the characterizing comprising;

    obtaining contextual characteristics for each of the corpus of documents and associating the contextual characteristics with the corresponding document, the contextual characteristics selected from a group consisting of;

    similarity score, type of similarity algorithm used to characterize the document, document family, document type, and metadata describing properties of the document;

    first removing a first portion of the corpus of documents based on applying a first similarity algorithm to the corpus of documents;

    second removing, after the first removing, a second portion of the corpus of documents based on applying a second similarity algorithm to the corpus of documents; and

    third removing, after the first removing and the second removing, a third portion of the corpus of documents based on applying a third similarity algorithm to the corpus of documents, the third similarity algorithm based on a criteria other than that implemented by the first similarity algorithm and the second similarity algorithm, wherein the third similarity algorithm identifies conceptually similar documents in the corpus of documents based on content of each respective document, and wherein the conceptually similar documents are neither exact duplicates nor substantial duplicates;

    defining stacks of documents based on pre-defined grouping criteria as applied to the characterized similarities among the corpus of documents, the characterized similarities based on the first removing, the second removing, or the third removing;

    identifying, within each stack, a prime document; and

    initiating provision of each prime document to at least one human reviewer via a computer-implemented document review and characterization system.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×