×

Page classifier engine

  • US 8,392,816 B2
  • Filed: 12/03/2007
  • Issued: 03/05/2013
  • Est. Priority Date: 12/03/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining a page type of a first portion of an electronic document utilizing one or more second portions of the electronic document, comprising:

  • receiving an OCR file associated with the electronic document, wherein the OCR file includes semantic information about text in the electronic document;

    analyzing the semantic information about the text in a first portion of the electronic document by applying one or more features to the semantic information, the one or more features comprising assigning a weight to the following;

    key phrases, a size of a word or phrase, font, a location of a word or phrase, one or more page numbers, and a repletion of a word or phrase in the first portion of the electronic document;

    using the semantic information about the text in the first portion of the electronic document to automatically reference a second portion of the electronic document;

    extracting semantic information about the text in the second portion of the electronic document;

    analyzing the semantic information about the text in the second portion of the electronic document by applying one or more features to the semantic information, the one or more features comprising determining that the second portion contains a page-type title, figure, or text similar to a page-type title, figure, or text in the first portion of the electronic document;

    determining the page type of the first portion of the electronic document based at least upon the application of the one or more features to the semantic information in the first portion of the electronic document and the application of the one or more features to the semantic information in the second portion of the electronic document; and

    storing an indication of the page type.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×