×

Method and system for creating subgroups of documents using optical character recognition data

  • US 9,069,768 B1
  • Filed: 04/03/2013
  • Issued: 06/30/2015
  • Est. Priority Date: 03/28/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system for creating subgroups of documents using optical character recognition data, the system comprising:

  • one or more processors; and

    a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to;

    create a matrix for words included in documents, wherein each column-row combination in the matrix indicates whether a corresponding word that is associated with the column-row combination is included in a corresponding document that is associated with the column-row combination;

    identify distances between pairs of the words in the matrix, wherein each distance is based on a number of the documents that differ in including a corresponding pair of the words;

    create word clusters, wherein each word cluster comprises pairs of words associated with a corresponding distance less than a distance threshold;

    create sets of word clusters, wherein a set of word clusters comprises word clusters that are not associated with any of the documents associated with other word clusters in the set of word clusters; and

    create subgroups of the digitized documents based on a set of word clusters corresponding to a high word score relative to at least one other word score corresponding to at least one other set of word clusters.

View all claims
  • 11 Assignments
Timeline View
Assignment View
    ×
    ×