×

System and method for capturing and processing business data

  • US 7,639,875 B2
  • Filed: 10/03/2008
  • Issued: 12/29/2009
  • Est. Priority Date: 05/18/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A server device for use in interpreting information in a document, comprising:

  • a storage component arranged to receive and store an image of a document received from a remote source; and

    a processor that includes data and instructions configured to perform actions, including;

    representing the image as text that includes a plurality of characters, some of the characters in the plurality having alternative versions with associated confidence probabilities;

    generating a set of tokenization'"'"'s, each tokenization comprising a set of unique tokens that comprise collections of characters, wherein different tokens are defined for different versions of a character, and wherein for characters with different versions a single version is included in a tokenization;

    assigning one or more tags to the tokens, the tags indicating a possible meaning of a corresponding token, and at least some of the tags having a score value indicating a probability of accuracy;

    parsing each tokenization in the set of tokenizations based on a determined grammar to obtain multiple tokenizations with a single tag being assigned to each token;

    assigning each tokenization an aggregate score based at least on compliance with the determined grammar; and

    selecting as a final tokenization one tokenization with tags based on the aggregate score from the multiple tokenizations.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×