Line alignment apparatus and process
First Claim
Patent Images
1. A computer-implemented process of comparing a line of text produced by reading a digital image of a document by a first reader to a line of text produced by reading a digital image of the same document by a second reader, each of the lines of text having a plurality of words, the process comprising:
- identifying the positions of words in the line of text produced by the first reader and in the line of text produced by the second reader;
analyzing the words in the line of text produced by the first reader and in the line of text produced by the second reader to identify matching words in corresponding positions in the lines of text produced by the first and second readers; and
outputting the line of text produced by the first reader to a first line-aligned file and output the line of text produced by the second reader to a second line-aligned file if the number of matching words in corresponding positions exceeds a predetermined threshold.
5 Assignments
0 Petitions
Accused Products
Abstract
A computer implemented process of matching two lines of text produced by optical character readers analyzes words in the two lines to identify matching words in corresponding positions in the two lines. Lines are then identified as matching based on the number of matching words in corresponding positions.
-
Citations
24 Claims
-
1. A computer-implemented process of comparing a line of text produced by reading a digital image of a document by a first reader to a line of text produced by reading a digital image of the same document by a second reader, each of the lines of text having a plurality of words, the process comprising:
-
identifying the positions of words in the line of text produced by the first reader and in the line of text produced by the second reader; analyzing the words in the line of text produced by the first reader and in the line of text produced by the second reader to identify matching words in corresponding positions in the lines of text produced by the first and second readers; and outputting the line of text produced by the first reader to a first line-aligned file and output the line of text produced by the second reader to a second line-aligned file if the number of matching words in corresponding positions exceeds a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented process of comparing a line of text produced by reading a digital image of a document by a first reader to a line of text produced by reading a digital image of the same document by a second reader, each of the lines of text having a plurality of words, the process comprising:
-
forming a truth table having a first axis representing words of the line of text produced by the first reader, a second axis representing words of the line of text produced by the second reader, and entries indicating that individual words along the first axis match individual words along the second axis; establishing a threshold; identifying a diagonal of the truth table; calculating a matching proportion for the diagonal by dividing the number of entries along the diagonal by the number of words along an axis; comparing the matching proportion to the threshold; and
outputting the lines of text to separate line-aligned files if the matching proportion exceeds the threshold. - View Dependent Claims (13, 14)
-
-
15. A computer-implemented process of forming a computer-readable file of text comprising:
-
reading a digital image of a document by each of a plurality of readers to produce a corresponding plurality of raw text files, each raw text file containing lines of text; line-aligning at least one raw text file to at least one other raw text file to produce a corresponding plurality of line-aligned files, wherein line-aligning comprises; identifying the positions of words in the lines of text contained in the raw text files being line-aligned; analyzing the words in the lines of text contained in the raw text files being line-aligned to identify matching words in corresponding positions; identifying matching lines of text based on the number of matching words in corresponding positions; and producing a plurality of line-aligned files containing the matching lines of text produced by the plurality of readers; character-aligning the line-aligned files to produce a corresponding plurality of line-aligned, character-aligned files; comparing the characters of the plurality of line-aligned, character-aligned files; and selecting a probable character from the plurality of line-aligned, character-aligned files for each character position in the computer-readable file of text. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification