×

Methods and apparatus for automated image classification

  • US 8,671,112 B2
  • Filed: 06/12/2008
  • Issued: 03/11/2014
  • Est. Priority Date: 06/12/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method of processing an unclassified electronic document image comprising information associated with a healthcare entity, the method comprising:

  • converting the image to a textual representation;

    identifying at least one term in the textual representation, wherein the at least one term is represented in training data indicating a degree of association between the at least one term and a plurality of document classifications, wherein the training data includes information about a plurality of terms extracted from a plurality of documents during training;

    determining, with at least one computer processor, for a first document classification of the plurality of document classifications, a first probability that the unclassified electronic document image belongs to the first document classification, wherein determining the first probability comprises for each term of the at least one term, multiplying a value from the training data indicating the degree of association between the term and the first document classification by an initial probability representing a percentage of historical documents that were classified using the first document classification, wherein the initial probability is scaled by a number of times the term appears in the textual representation;

    assigning to the unclassified electronic document image, a document classification to produce a classified electronic document image, wherein the document classification is assigned based, at least in part, on the determined first probability;

    associating a confidence score with the classified electronic document image, wherein the confidence score is determined based, at least in part, on a plurality of classification type probability values each indicating a likelihood that the classified electronic document image is associated with one of the plurality of document classifications; and

    determining that the classified electronic document image was accurately classified if the confidence score exceeds a predetermined threshold value.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×