×

System and method for electronic document classification

  • US 8,428,367 B2
  • Filed: 09/30/2008
  • Issued: 04/23/2013
  • Est. Priority Date: 10/26/2007
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of classifying electronic documents, comprising:

  • converting a hypertext markup language (HTML) candidate electronic document comprising character data to a single candidate image, the converting including extracting a body section of the HTML candidate electronic document and converting the entire body section of the HTML candidate electronic document into the single candidate image;

    scaling the entire single candidate image to a size substantially smaller than an original size of the candidate image to provide a single scaled candidate image;

    obtaining a representation of a degree of visual similarity of the entire single scaled candidate image to a reference image by performing a single comparison of the entire single scaled candidate image to the entire reference image, the reference image having been obtained by identifying a reference electronic document containing character data representative of a specified classification;

    automatically classifying the candidate electronic document under the specified classification when the degree of visual similarity exceeds a predetermined threshold and, in response to the degree of visual similarity exceeding the predetermined threshold, converting the reference electronic document to a reference image; and

    determining an efficiency of the classifying by comparing a number of candidate electronic documents that are automatically classified under the specified classification to a number of candidate electronic documents that a user classifies under the specified classification.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×