Systems and methods for triage of passages of text output from an OCR system
First Claim
1. A method for automatic triage of a text passage outputted by an optical character recognition system, the OCR-output text passage having multiple text segments, individual ones of the text segments including at least one OCR-output character, the method comprising:
- determining at least one OCR-output character attribute for each of the OCR-output characters in the OCR-output text passage;
determining an error rate for the OCR-output text passage as a whole using a triage model and the determined OCR-output character attributes; and
comparing the determined error rate for the OCR-output text passage with an OCR-output text passage threshold error rate to perform an OCR-output text passage triage decision.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for triage of passages of text output from an OCR system by use of trainable models of the accuracy of the OCR system based on attributes of individual characters. The systems and methods according to this invention automatically triage an OCR-output text passage by determining at least one OCR-output character attribute for each OCR-output character, determining an error rate for the OCR-output text passage using a triage model and the determined at least one OCR-output character attribute, and comparing the determined error rate for the OCR-output text passage with an OCR-output text passage threshold error rate to perform an OCR-output text passage triage decision. Triage decision includes for example, sending OCR results directly to an end user without any post-OCR processing, sending the OCR results through a post-OCR inspection and processing stage, sending the original document image to be completely keyed in manually, and a combination thereof.
-
Citations
31 Claims
-
1. A method for automatic triage of a text passage outputted by an optical character recognition system, the OCR-output text passage having multiple text segments, individual ones of the text segments including at least one OCR-output character, the method comprising:
-
determining at least one OCR-output character attribute for each of the OCR-output characters in the OCR-output text passage; determining an error rate for the OCR-output text passage as a whole using a triage model and the determined OCR-output character attributes; and comparing the determined error rate for the OCR-output text passage with an OCR-output text passage threshold error rate to perform an OCR-output text passage triage decision. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method for triage of a plurality of OCR-output text passages, each OCR-output text passage having multiple text segments, individual ones of the text segments including at least one OCR-output character, the method comprising:
-
selecting a set of OCR-output character attributes from a plurality of OCR-output character attributes for each OCR-output character; determining an OCR-output character error value for each OCR-output character based on a probability of the set of OCR-output character attributes being erroneously interpreted by the OCR system; determining a text passage error value for each OCR-output text passage as a whole based on a probability of the text passage being erroneously interpreted by the OCR system as determined using at least the OCR-output character error values; and comparing the determined text passage error value with an OCR-output text passage threshold error value to perform an OCR-output text passage triage decision. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. An OCR-output text passage triage system that triages a text passage outputted by an optical character recognition system, the OCR-output text passage including multiple text segments, individual ones of the text segments including at least one OCR-output character, each having at least one OCR-output character attribute, the system comprising:
-
an OCR-output text passage character accuracy determination circuit or routine that determines a character interpretation error value for individual OCR-output characters within the OCR-output text passage using a triage model; an OCR-output text passage accuracy determination circuit or routine that determines at least one OCR-output text passage quality metric for the text passage as a whole using the determined character interpretation error value and at least one statistical algorithm or model included in the triage model; and an OCR-output text passage triage circuit or routine that performs one or more text passage triage decisions using the determined at least one OCR-output text passage quality metric and an OCR-output text passage threshold error rate value. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A computer-readable medium that provides instructions for triage of a text passage outputted by an optical character recognition system, the OCR-output text passage having multiple text segments, individual ones of the text segments including at least one OCR-output character, instructions, which when executed by a processor, cause the processor to perform operations comprising:
-
determining at least one OCR-output character attribute for each of the OCR-output characters in the OCR-output text passage; determining an error rate for the OCR-output text passage as a whole using a triage model and the determined OCR-output character attributes; and
comparing the determined error rate for the OCR-output text passage with an OCR-output text passage threshold error rate to perform an OCR-output text passage triage decision. - View Dependent Claims (26, 27, 28, 29, 30, 31)
-
Specification