Post-processing system and method for correcting machine recognized text
First Claim
1. A post-processor for character data of an optical character recognition (OCR) engine comprising:
- a word segmentation engine coupled to the OCR engine to segment the character data into a plurality of initial words;
a word level processor coupled to the word segmentation engine to process the plurality of initial words and determine a set of candidate words corresponding to each initial word;
a sentence segmentation engine coupled to the word level processor to segment the plurality of initial words into at least one sentence; and
a word disambiguity processor coupled to the sentence segmentation engine to determine a final word from each set of candidate words;
wherein the word disambiguity processor processes each sentence of the at least one sentence separately.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of post-processing character data from an optical character recognition (OCR) engine and apparatus to perform the method. This exemplary method includes segmenting the character data into a set of initial words. The set of initial words is word level processed to determine at least one candidate word corresponding to each initial word. The set of initial words is segmented into a set of sentences. Each sentence in the set of sentences includes a plurality of initial words and candidate words corresponding to the initial words. A sentence is selected from the set of sentences. The selected sentence is word disambiguity processed to determine a plurality of final words. A final word is selected from the at least one candidate word corresponding to a matching initial word. The plurality of final words is then assembled as post-processed OCR data.
-
Citations
21 Claims
-
1. A post-processor for character data of an optical character recognition (OCR) engine comprising:
-
a word segmentation engine coupled to the OCR engine to segment the character data into a plurality of initial words;
a word level processor coupled to the word segmentation engine to process the plurality of initial words and determine a set of candidate words corresponding to each initial word;
a sentence segmentation engine coupled to the word level processor to segment the plurality of initial words into at least one sentence; and
a word disambiguity processor coupled to the sentence segmentation engine to determine a final word from each set of candidate words;
wherein the word disambiguity processor processes each sentence of the at least one sentence separately. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of post-processing character data from an optical character recognition (OCR) engine, comprising the steps of:
-
a) segmenting the character data into a set of initial words;
b) word level processing the set of initial words and determining at least one candidate word corresponding to each initial word;
c) segmenting the set of initial words into a set of sentences, each sentence in the set of sentences including a plurality of initial words and candidate words corresponding to the initial words;
d) selecting, from the set of sentences, a sentence;
e) word disambiguity processing the sentence selected in step (d) to determine a plurality of final words, wherein a final word is selected from the at least one candidate word corresponding to a matching initial word; and
f) assembling the plurality of final words as post-processed OCR data. - View Dependent Claims (7, 8, 9)
-
-
10. A computer readable medium adapted to instruct a general purpose computer to post-process character data from an optical character recognition (OCR) engine, the method comprising the steps of:
-
a) segmenting the character data into a set of initial words;
b) word level processing the set of initial words and determining at least one candidate word corresponding to each initial word;
c) segmenting the set of initial words into a set of sentences, each sentence including a plurality of initial words and candidate words corresponding to the initial words;
d) selecting, from the set of sentences, a sentence;
e) word disambiguity processing the sentence selected in step (d) to determine a plurality of final words, wherein a final word is selected from the at least one candidate word corresponding to a matching initial word; and
f) assembling the plurality of final words as post-processed OCR data. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification