×

Method and apparatus for converting bitmap image documents to editable coded data using a standard notation to record document recognition ambiguities

  • US 5,359,673 A
  • Filed: 12/27/1991
  • Issued: 10/25/1994
  • Est. Priority Date: 12/27/1991
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of transforming a document represented as a bitmap image into an editable coded data stream defined using a machine readable document description language that records coded information resulting from the document transformation process and information regarding uncertainties in the document transformation process, comprising:

  • performing a first transformation operation on at least a text portion of said bitmap image using a character recognition apparatus, to transform at least said text portion of said bitmap image into coded information recognized with a level of confidence;

    outputting and recording said coded information into one or more elements that are defined using said document description language, each element having a machine readable element type identifier that indicates the type of said coded information regarding the recognized bitmap image recorded in said element so that the type of coded information contained in each element can be known without examining the coded information contained in each element, the element type identifier for each element having been defined based on the type of coded information recorded in the element and the level of confidence with which said bitmap image represented by said coded information was recognized so that each of said elements is selectively identified, each element having coded information of a single type recorded therein; and

    when said character recognition apparatus determines that the recognized bitmap image contained in an element has not been recognized with at least a predetermined level of confidence, recording in said element uncertainty information determined by said first recognition apparatus regarding said recognized bitmap image contained in said element;

    wherein said element type identifier is a character-string-element or a questionable-character-element, each character-string-element containing a string of consecutive characters recognized by said character recognition apparatus with at least said predetermined level of confidence, each questionable-character-element containing said uncertainty information determined by said character recognition apparatus for a character which was not recognized with at least said predetermined level of confidence by said character recognition apparatus.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×