×

System and method for capturing and processing business data

  • US 7,450,760 B2
  • Filed: 07/06/2005
  • Issued: 11/11/2008
  • Est. Priority Date: 05/18/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of interpreting information in a document comprising:

  • receiving an image of a document from a remote source;

    representing the image as text comprising characters, wherein at least some of the characters have alternative versions with associated confidence probabilities;

    representing the text as tokens, wherein the tokens comprise collections of characters and wherein different tokens are defined for different versions of a character;

    combining tokens into tokenizations, wherein each tokenization is a set of tokens, wherein for characters with different versions only one version is included in a tokenization;

    assigning one or more tags to those tokens, wherein the tags indicate a possible meaning of a corresponding token, and assigning a score value indicating a probability of accuracy of a corresponding tag;

    parsing each of said tokenizations based on a predetermined grammar so as to obtain multiple tokenizations wherein only one tag with associated score is assigned to each token based on both dictionary and grammar matching;

    assigning each tokenization an aggregate score based on compliance with the grammar and scores of all tokens; and

    selecting one tokenization with tags using the aggregated score as a metric of success so as to obtain a final tokenization from the multiple tokenizations with tags.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×