×

Display of document image optimized for reading

  • US 8,254,681 B1
  • Filed: 06/24/2009
  • Issued: 08/28/2012
  • Est. Priority Date: 02/05/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for identifying semantically meaningful segments of an image of a document, the image of the document having a plurality of pages, the method comprising:

  • applying an optical character recognition algorithm to the image of the document to obtain a plurality of document segments, each document segment corresponding to a region of the image of the document and having associated recognized text;

    calculating a text quality score for at least one document segment of the plurality of document segments, the text quality score characterizing a quality of optical character recognition for the document segment;

    identifying a semantic component of the document comprised of one or more of the document segments;

    creating a document representation comprising the document segments and identified semantic components;

    storing the document representation in association with an identifier of the image of the document; and

    for the document segment, determining, based at least in part on the text quality score associated with the document segment, whether to display the document segment as the associated recognized text of the document segment, or as a portion of the image of the document corresponding to the document segment.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×