Method and apparatus for determination of verified data
First Claim
1. A method of determining the veracity of data, comprising:
- scanning a document containing characters and images as data into a memory;
generating predetermined accuracy statistics for use in determining the accuracy of the data;
performing automated recognition of the data so as to generate classifications and confidences for the data;
manually inputting classification data by a first operator for that data having a low confidence;
randomly selecting data from the manually input classification data; and
if the classification data randomly selected is the same as the classification data generated by the performance of automated recognition, then the manually input data is found to be accurate and thus passes a quality assurance test.
7 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for determining the veracity of data. The method includes at least one comparison of manually input data and data generated by an automated process. The data generated by the automated process is typically the result of optical character recognition (OCR) and includes the generation of both classification of the optically read data as well as confidences for the data. For each piece of data processed by the OCR, the OCR program will generate a classification, which is a guess by the OCR program as to what that piece of data is, and a confidence level, which is the OCR'"'"'s evaluation of how good the classification guess was. Depending on whether it is desired to check the accuracy of the manually input data or the results of the optical character recognition determines how the data is compared. The low confidence data is typically re-keyed into the system manually. To perform quality assurance on the manually input data, one compares the results of the optical character recognition with the manually input data. If they match, the manually input data is determined to be accurate and passes a quality assurance test. If they do not match, a second operator inputs additional data manually. This will be compared to the first set of manually input data, and if these sets of data match, then the first set of manually input data is determined to be accurate, thus passing the quality assurance test. If these data sets do not match, then the results of the optical character recognition are compared to the second set of manually input data. If these two sets of data match, then it is determined that the first set of manually input data is inaccurate and thus failing the quality assurance test. Similar comparisons are performed to test the veracity of the optical character recognition data.
-
Citations
22 Claims
-
1. A method of determining the veracity of data, comprising:
-
scanning a document containing characters and images as data into a memory;
generating predetermined accuracy statistics for use in determining the accuracy of the data;
performing automated recognition of the data so as to generate classifications and confidences for the data;
manually inputting classification data by a first operator for that data having a low confidence;
randomly selecting data from the manually input classification data; and
if the classification data randomly selected is the same as the classification data generated by the performance of automated recognition, then the manually input data is found to be accurate and thus passes a quality assurance test. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
randomly selecting classification data from the classification data of high confidence generated by the automated recognition;
manually inputting classification data for that data of high confidence that was randomly selected; and
if the manually input classification data is the same as the classification data generated by the automated recognition, then the automated recognition data is found to be accurate and thus passes a quality assurance test.
-
-
3. The method according to claim 2, further comprising:
-
if the manually input classification data does not match the classification data generated by the automated recognition, manually inputting by a second operator classification data of the randomly selected high confidence classification data generated by the automated recognition; and
if the classification data input by the second operator matches the automated recognition classification data, then the automated recognition data is found to be accurate and passes the quality assurance test.
-
-
4. The method according to claim 3, further comprising:
if the classification data input by the second operator does not match the automated recognition classification data, then the manually input classification data is compared with that data input by the second operator and if these two sets of data match one another, then the automated recognition classification data fails the quality assurance test.
-
5. The method according to claim 4, further comprising outputting that the results are inconclusive if the manually input data does not match the data input by the second operator.
-
6. The method according to claim 1, further comprising:
-
manually inputting by a second operator classification data corresponding to the low confidence data;
if the classification data input by the second operator matches the classification data input by the first operator, then it is determined that the data input by the first operator is accurate, thus passing the quality assurance test.
-
-
7. The method according to claim 6, further comprising:
if the classification data input by the second operator does not match the classification data input by the first operator, then comparing the classification data input by the second operator with the classification data generated by the automated recognition.
-
8. The method according to claim 7, wherein if the classification data input by the second operator matches the classification data generated by the automated recognition process, then it is determined that the data input by the first operator is not accurate and that data fails the quality assurance test.
-
9. The method according to claim 7, wherein if the classification data input by the second operator does not match the classification data generated by the automated recognition process, then an inconclusive results signal is output.
-
10. An apparatus for determining the veracity of data, comprising:
-
a scanner unit to scan the data into the apparatus;
an automated recognition unit to generate automated recognition data from the data scanned into the apparatus;
a memory unit to store the scanned data and manipulations thereof;
a mathematical processor unit to generate accuracy statistics;
at least one manual input station to manually input data to the memory unit; and
at least one comparator to compare the automated recognition data with the manually input data. - View Dependent Claims (11, 12, 13)
-
-
14. An apparatus for testing the veracity of data, comprising:
-
means for inputting data;
means for generating optical character recognition data corresponding to the input data;
means for generating a statistical analysis to perform on the optical character recognition data;
means for storing data;
means for manually entering data to the storing means; and
means for comparing the manually entered data with the optical character recognition data. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
-
22. A method of testing the veracity of OCR data, comprising:
-
scanning data into a system;
manually inputting data including a selected set of the scanned data as a first data set by a first operator;
comparing the manually input data with the scanned data;
if the data match one another, determining that the data is accurate;
if the data do not match, having a second operator manually input the selected set of scanned data as a second data set;
comparing the second data set with the first data set and the scanned data; and
providing that at least two of the three data sets match one another, determining the veracity of the scanned data and the manually input data sets.
-
Specification