×

Methods and systems that use hierarchically organized data structure containing standard feature symbols in order to convert document images to electronic documents

  • US 9,911,034 B2
  • Filed: 06/18/2013
  • Issued: 03/06/2018
  • Est. Priority Date: 06/18/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system that transforms a document image into an electronic document, the system comprising:

  • one or more processors;

    one or more electronic memories; and

    a hierarchically organized data structure, stored in one or more of the one or more electronic memories, the hierarchically organized data structure comprising a plurality of entries corresponding to one or more natural-language entities selected from among one or more morphemes, words, or phrases encoded as sequences of standard feature symbols, wherein the plurality of entries are associated with a plurality of scores; and

    computer instructions, digitally encoded and stored in one or more of the one or more electronic memories and executed on the one or more processors, that;

    receive an image comprising text of a language;

    identify a subimage within the image, the subimage corresponding to one or more of words and morphemes;

    identify a set of character-sequences that represent candidate character-sequence representations of the subimage, wherein a character-sequence of the set is identified by traversing a path of the hierarchically organized data structure and accumulating a value for the character-sequence based on the scores on the path, wherein the value for the character-sequence in the set satisfies a predetermined threshold;

    use the candidate character-sequence representations of the subimage as hypotheses regarding lexical identities of the subimage;

    construct a portion of an electronic document corresponding to the received image of text using the hypotheses regarding the lexical identities of the subimage; and

    store the constructed portion of the electronic document in one or more of the one or more electronic memories.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×