×

METHODS AND SYSTEMS THAT USE HIERARCHICALLY ORGANIZED DATA STRUCTURE CONTAINING STANDARD FEATURE SYMBOLS IN ORDER TO CONVERT DOCUMENT IMAGES TO ELECTRONIC DOCUMENTS

  • US 20160267323A1
  • Filed: 06/18/2013
  • Published: 09/15/2016
  • Est. Priority Date: 06/18/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system that transforms a document image into an electronic document, the system comprising:

  • one or more processors;

    one or more electronic memories; and

    a hierarchically organized data structure, stored in one or more of the one or more electronic memories, each entry of which corresponds to one or more natural-language entities selected from among one or more morphemes, words, or phrases encoded as sequences of standard feature symbols; and

    computer instructions, digitally encoded and stored in one or more of the one or more electronic memories and executed on the one or more processors, thatreceive an image of a block of text of an Arabic-like language,identify images of lines of text within the received image of the block of text;

    identify subimages within the image of the line of text corresponding to one or more of words and morphemes,for each identified subimage,identify sets of characters that represent candidate character-sequence representations of the subimage; and

    use the candidate character-sequence representations of the subimages as hypotheses regarding the lexical identities of the subimages;

    reconstruct a portion of an electronic document corresponding to the received image of the block of text using the hypotheses regarding the lexical identities of the subimages; and

    store the reconstructed portion of the electronic document in one or more of the one or more electronic memories.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×