Method for processing optical character recognizer output
First Claim
Patent Images
1. A computer implemented method for processing an output of an optical character recognizer (OCR), the computer implemented method comprising:
- receiving an OCR-converted character sequence from the OCR; and
converting a set of OCR-converted characters of the OCR-converted character sequence to a corresponding corrected set of characters to generate a corrected character sequence based on;
a probability of replacing an OCR-converted character of the set of OCR-converted characters with one or more target characters to form one or more characters of the corrected set of characters, andlanguage scores generated by a language model.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, a system, and a computer program product for processing the output of an OCR are disclosed. The system receives a first character sequence from the OCR. A first set of characters from the first character sequence are converted to a corresponding second set of characters to generate a second character sequence based on a look-up table and language scores.
26 Citations
20 Claims
-
1. A computer implemented method for processing an output of an optical character recognizer (OCR), the computer implemented method comprising:
-
receiving an OCR-converted character sequence from the OCR; and converting a set of OCR-converted characters of the OCR-converted character sequence to a corresponding corrected set of characters to generate a corrected character sequence based on; a probability of replacing an OCR-converted character of the set of OCR-converted characters with one or more target characters to form one or more characters of the corrected set of characters, and language scores generated by a language model. - View Dependent Claims (2, 3, 4, 5, 6, 18, 19)
-
-
7. A computer implemented method for processing an output of an optical character recognizer (OCR), the computer implemented method comprising:
-
receiving a first character sequence from the OCR, converting a first set of characters from the first character sequence to a corresponding second set of characters to generate a second character sequence based on one or more finite state transducers (FSTs) corresponding to each character of the first character sequence and language scores generated by a language model, wherein weights associated with the one or more FSTs and the language model are determined using a Minimum Error Rate Training (MERT) technique, wherein a number of edits for the conversion of the first set of characters to the corresponding second set of characters is less than or equal to a predetermined number of edits.
-
-
8. A computer implemented method for language translation comprising:
-
receiving an Optical Character Recognizer (OCR)-converted character sequence from an OCR, the OCR-converted character sequence being in a first language; converting a set of OCR-converted characters of the OCR-converted character sequence to a corresponding corrected set of characters to generate a corrected character sequence based on a look-up table and language scores generated by a language model, wherein the corrected character sequence is in the first language, wherein the look-up table comprises a probability score for each character of the set of OCR-converted characters, wherein the probability score represents a probability of replacing an OCR-converted character of the set of OCR-converted characters with one or more target characters to form one or more characters of the corrected set of characters; and translating an OCR-converted word sequence corresponding to the corrected character sequence to a corrected word sequence in a second language. - View Dependent Claims (15, 16, 17)
-
-
9. A system for processing an output of an optical character recognizer (OCR), the system comprising a processor coupled to a memory, the memory having stored therein one or more program modules comprising:
a conversion module configured to convert an OCR-converted set of characters in an OCR-converted character sequence to a corresponding corrected set of characters to generate a corrected character sequence based on a look-up table and language scores, wherein the OCR-converted character sequence is received from the OCR, wherein the look-up table comprises a probability score for each character of the set of OCR-converted characters, wherein the probability score represents a probability of replacing an OCR-converted character of the set of OCR-converted characters with one or more target characters to form one or more characters of the corrected set of characters. - View Dependent Claims (10, 11, 12, 13, 20)
-
14. A system for language translation, the system comprising a processor coupled to a memory, the memory having stored therein one or more program modules comprising:
-
a conversion module configured to convert an Optical Character Recognizer (OCR)-converted set of characters in an OCR-converted character sequence to a corresponding corrected set of characters to generate a corrected character sequence based on a look-up table and language scores generated by a language model, the OCR-converted character sequence being received from an OCR, wherein the OCR-converted character sequence and the character sequence are in a first language, wherein the look-up table comprises a probability score that represents a probability of replacing an OCR-converted character of the set of OCR-converted characters with one or more target characters to form one or more characters of the corrected set of characters; and a translation module configured to translate an OCR-converted word sequence corresponding to the corrected character sequence to a corrected word sequence in a second language.
-
Specification