Systems, methods and computer program products for determining document validity

US 8,526,739 B2
Filed: 11/30/2012
Issued: 09/03/2013
Est. Priority Date: 02/10/2009
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

performing optical character recognition (OCR) on an image of a first document;

generating a list of hypotheses mapping the first document to a complementary document using;

textual information from the first document,textual information from the complementary document, andpredefined business rules;

at least one of;

correcting OCR errors in the first document, and normalizing data from the complementary document, using at least one of the textual information from the complementary document and the predefined business rules;

determining a validity of the first document based on the hypotheses; and

outputting an indication of the determined validity.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method according to one embodiment includes performing optical character recognition (OCR) on an image of a first document; generating a list of hypotheses mapping the first document to a complementary document using: textual information from the first document, textual information from the complementary document, and predefined business rules; at least one of: correcting OCR errors in the first document, and normalizing data from the complementary document, using at least one of the textual information from the complementary document and the predefined business rules; determining a validity of the first document based on the hypotheses; and outputting an indication of the determined validity. Additional systems, methods and computer program products are also presented.

64 Citations

View as Search Results

67 Claims

1. A method, comprising:
- performing optical character recognition (OCR) on an image of a first document;
  
  generating a list of hypotheses mapping the first document to a complementary document using;
  
  textual information from the first document,textual information from the complementary document, andpredefined business rules;
  
  at least one of;
  
  correcting OCR errors in the first document, and normalizing data from the complementary document, using at least one of the textual information from the complementary document and the predefined business rules;
  
  determining a validity of the first document based on the hypotheses; and
  
  outputting an indication of the determined validity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. A method as recited in claim 1, wherein the first document is an invoice, wherein the complementary document is at least one of a purchase order, a memorandum, and a delivery note having a relationship with the invoice.
  - 3. A method as recited in claim 1, wherein the image is a scanned image generated using a scanner.
  - 4. A method as recited in claim 1, further comprising normalizing data from the first document using at least one of the textual information from the complementary document and the predefined business rules.
  - 5. A method as recited in claim 1, further comprising normalizing data from the complementary document using at least one of the textual information from the first document and the predefined business rules.
  - 6. A method as recited in claim 1, further comprising generating an alert upon encountering a potential problem when determining the validity of the first document, wherein the alert includes identification of a mismatch in expected similar or identical values in the first and complementary documents.
  - 7. A method as recited in claim 1, further comprising receiving user input indicating at least one of a correction and a validation of a line item or header field item of the first document.
  - 8. A method as recited in claim 1, wherein determining the validity of the first document includes automatically estimating values for expected or actual line items in the first document.
  - 9. A method as recited in claim 1, wherein determining the validity of the first document includes automatically correcting values for expected or actual line items or header field items in the first document based on at least one of the textual information from the complementary document and the business rules.
  - 10. A method as recited in claim 1, wherein the first document is at least one of an explanation of benefits document, a sales order document, and an insurance claim document.
  - 11. A method as recited in claim 1, further comprising correlating a term on the first document and a different term on the complementary document as referring to a same thing, and storing the correlation of the terms in a database.
  - 12. A method as recited in claim 1, further comprising reconstructing the first document using the hypotheses and business rules, wherein the determining the validity step analyzes the reconstructed first document.
  - 13. A method as recited in claim 1, further comprising, upon determining that the first document is valid, generating knowledge based on the hypotheses generated.
  - 14. A method as recited in claim 1, further comprising outputting a reconciliation screen to a user upon failing to determine the first document is valid or determining that the first document is invalid.
  - 15. A method as recited in claim 14, further comprising receiving a modification to the first document by a user viewing the reconciliation screen;
    - and attempting to re-validate the modified first document.
  - 16. A method as recited in claim 1, wherein the determined validity is used to validate a business transaction.
  - 17. A method as recited in claim 1, further comprising:
    - identifying a second complementary document associated with a second document;
      
      generating a list of hypotheses mapping the second document to the second complementary document using;
      
      textual information from the second document,textual information from the second complementary document, andpredefined business rules;
      
      determining a validity of the second document based on the hypotheses; and
      
      outputting an indication of the determined validity of the second document.
  - 18. A computer program product comprising computer code embodied on a non-transitory computer readable medium, the computer code comprising:
    - code for performing the method of claim 1.
  - 19. A system, comprising:
    - a device having a processor and logic configured for causing the processor to perform the method of claim 1.

20. A method, comprising:
- determining a validity of a first document by simultaneously considering;
  
  textual information from the first document,textual information from a complementary document, andpredefined business rules;
  
  at least one of;
  
  correcting OCR errors in the first document, and normalizing data from the first document prior to determining the validity, using at least one of the textual information from the complementary document and the predefined business rules; and
  
  outputting an indication of the determined validity.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 21. A method as recited in claim 20, further comprising normalizing data from the complementary document using at least one of the textual information from the first document and the predefined business rules.
  - 22. A method as recited in claim 20, further comprising generating an alert upon encountering a potential problem when determining the validity of the first document, wherein the alert includes identification of a mismatch in expected similar or identical values in the first and complementary documents.
  - 23. A method as recited in claim 20, further comprising receiving user input indicating at least one of a correction and a validation of a line item or header field item of the first document.
  - 24. A method as recited in claim 20, wherein determining the validity of the first document includes automatically estimating values for expected or actual line items in the first document.
  - 25. A method as recited in claim 20, wherein determining the validity of the first document includes automatically correcting values for expected or actual line items or header field items in the first document based on at least one of the textual information from the complementary document and the business rules.
  - 26. A method as recited in claim 20, wherein the determined validity is used to validate a business transaction.
  - 27. A method as recited in claim 20, further comprising:
    - acquiring an electronic second document;
      
      identifying a second complementary document associated with the second document;
      
      generating a list of hypotheses mapping the second document to the second complementary document using;
      
      textual information from the second document,textual information from the second complementary document, andpredefined business rules;
      
      determining a validity of the second document based on the hypotheses; and
      
      outputting an indication of the determined validity of the second document.
  - 28. A computer program product comprising computer code embodied on a non-transitory computer readable medium, the computer code comprising:
    - code for performing the method of claim 20.
  - 29. A system, comprising:
    - a device having a processor and logic configured for causing the processor to perform the method of claim 20.

30. A method, comprising:
- receiving an image of a document;
  
  performing optical character recognition (OCR) on the image of the document;
  
  extracting an address of a sender of the document from the image based on the OCR;
  
  comparing the extracted address with content in a first database;
  
  identifying complementary textual information in a second database based on the address; and
  
  at least one of;
  
  extracting additional content from the image of the document;
  
  correcting OCR errors in the document using the complementary textual information, andnormalizing data from the document prior to determining a validity of the document using at least one of the complementary textual information and predefined business rules.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
- - 31. A computer program product comprising computer code embodied on a non-transitory computer readable medium, the computer code comprising:
    - code for performing the method of claim 30.
  - 32. A system, comprising:
    - a device having a processor and logic configured for causing the processor to perform the method of claim 30.
  - 33. The method as recited in claim 30, further comprising validating textual information in the document
  - 34. The method as recited in claim 30, further comprising correlating textual information from the document to textual information in a complementary document.
  - 35. The method as recited in claim 30, further comprising updating textual information of the document.
  - 36. The method as recited in claim 35, wherein the updating comprises correcting one or more OCR errors.
  - 37. The method as recited in claim 36, wherein the one or more OCR errors comprise one or more of incorrectly identified characters and unidentified characters.
  - 38. The method as recited in claim 30, wherein one or more of the comparing and the identifying comprises fuzzy matching.
  - 39. The method as recited in claim 30, wherein extracting the address of the sender comprises scanning a barcode.
  - 40. The method as recited in claim 30, wherein the complementary textual information corresponds to one or more fields of a complementary document.
  - 41. The method as recited in claim 40, wherein the complementary document is an electronic document in the second database.
  - 42. The method as recited in claim 40, wherein the complementary document is an extraction template.
  - 43. The method as recited in claim 30, further comprising determining whether the document is related to a complementary document.
  - 44. The method as recited in claim 43, further comprising determining a confidence level of the determination of whether the document is related.
  - 45. The method as recited in claim 30, wherein the additional content comprises textual information in a format specific to the sender.
  - 46. The method as recited in claim 30, wherein the additional content comprises non-textual information specific to the sender.
  - 47. The method as recited in claim 30, wherein extracting the additional content comprises template-based extraction.
  - 48. The method as recited in claim 30, wherein extracting the additional content utilizes location information regarding the additional content.

49. A method, comprising:
- receiving an image of a part or all of a document selected from a group consisting of;
  
  an invoice, a bill, a receipt, a sales order, an insurance claim, a medical insurance document, and a benefits document;
  
  performing optical character recognition (OCR) on the image;
  
  extracting at least a partial address of a sender of the document;
  
  comparing the at least partial address of the sender to a plurality of addresses in a first database; and
  
  identifying one or more of;
  
  textual information specific to the sender; and
  
  data formatting specific to the sender.
- View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67)
- - 50. The method as recited in claim 49, further comprising validating textual information in the document
  - 51. The method as recited in claim 49, further comprising correlating textual information from the document to textual information in a complementary document.
  - 52. The method as recited in claim 49, further comprising updating textual information of the document.
  - 53. The method as recited in claim 52, wherein the updating comprises correcting one or more OCR errors.
  - 54. The method as recited in claim 53, wherein the one or more OCR errors comprise one or more of incorrectly identified characters and unidentified characters.
  - 55. The method as recited in claim 49, wherein one or more of the comparing and the identifying comprises fuzzy matching.
  - 56. The method as recited in claim 49, wherein extracting the at least partial address comprises scanning a barcode.
  - 57. The method as recited in claim 49, wherein the data formatting specific to the sender comprises one or more fields corresponding to fields of a complementary document.
  - 58. The method as recited in claim 49, further comprising comparing one or more portions of the document to one or more portions of a complementary document.
  - 59. The method as recited in claim 58, wherein the complementary document is an electronic document in a second database.
  - 60. The method as recited in claim 58, wherein the complementary document is an extraction template.
  - 61. The method as recited in claim 49, further comprising determining whether the document is related to a complementary document.
  - 62. The method as recited in claim 61, further comprising determining a confidence level of the determination of whether the document is related.
  - 63. The method as recited in claim 49, wherein the data formatting specific to the sender comprises textual information in a format specific to the sender.
  - 64. The method as recited in claim 49, wherein the data formatting specific to the sender comprises non-textual information.
  - 65. The method as recited in claim 49, further comprising extracting additional content from the image of the document.
  - 66. The method as recited in claim 65, wherein extracting the additional content comprises template-based extraction.
  - 67. The method as recited in claim 65, wherein extracting the additional content utilizes location information regarding the additional content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kofax Incorporated
Original Assignee
Kofax Incorporated
Inventors
Schmidtler, Mauritius A. R., Borrey, Roland G., Amtrup, Jan W., Thompson, Stephen Michael
Primary Examiner(s)
MARIAM, DANIEL G

Application Number

US13/691,610
Publication Number

US 20130088757A1
Time in Patent Office

277 Days
Field of Search

382/181, 382/182, 382/209, 382/218, 382/305, 382/317, 382/321, 705/26, 705/35, 705/40, 705/44, 705/82
US Class Current

382/182
CPC Class Codes

G06V 30/416 Extracting the logical stru...

H04N 1/40 Picture signal circuits H04...

Systems, methods and computer program products for determining document validity

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

64 Citations

67 Claims

Specification

Solutions

Use Cases

Quick Links

Systems, methods and computer program products for determining document validity

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

67 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links