Apparatus and method for OCR character and confidence determination using multiple OCR devices
First Claim
1. A character recognition (CR) system for recognizing characters within a digital page image, comprising:
- an input interface connected to receive said digital page image;
a plurality of CR devices, each connected to receive said digital page image from said input interface, each of said CR devices converting a portion of said digital page image into output data signals indicating a character at said portion wherein the characters indicated in said output data signals constitute one or more candidate characters for said portion; and
a voting unit coupled to receive said output data signals from each of said plurality of CR devices, said voting unit assigning an attribute for each indicated character as a function of the indicated character and the CR device indicating said indicated character,wherein, for each candidate character, said voting unit keeps a running tally of attributes for those of said plurality of CR devices which indicate said candidate character to arrive at a final tally, representing a composite attribute for said candidate character, when attributes for all of said plurality of CR devices which indicate said candidate character are accounted for in said running tally, andwherein said voting unit selects a recognized character from one of the candidate characters based on the composite attributes of the candidate characters and produces a combined data signal representing said recognized character.
1 Assignment
0 Petitions
Accused Products
Abstract
In an optical character recognition (OCR) system an improved method and apparatus for recognizing the character and producing an indication of the confidence with which the character has been recognized. The system employs a plurality of different OCR devices each of which outputs a indicated (or recognized) character along with the individual devices own determination of how confident it is in the indication. The OCR system uses that data output from each of the different OCR devices along with other attributes of the indicated character such as the relative accuracy of the particular OCR device indicating the character to choose the select character recognized by the system and to produce a combined confidence indication of how confident the system is in its recognition.
-
Citations
18 Claims
-
1. A character recognition (CR) system for recognizing characters within a digital page image, comprising:
-
an input interface connected to receive said digital page image; a plurality of CR devices, each connected to receive said digital page image from said input interface, each of said CR devices converting a portion of said digital page image into output data signals indicating a character at said portion wherein the characters indicated in said output data signals constitute one or more candidate characters for said portion; and a voting unit coupled to receive said output data signals from each of said plurality of CR devices, said voting unit assigning an attribute for each indicated character as a function of the indicated character and the CR device indicating said indicated character, wherein, for each candidate character, said voting unit keeps a running tally of attributes for those of said plurality of CR devices which indicate said candidate character to arrive at a final tally, representing a composite attribute for said candidate character, when attributes for all of said plurality of CR devices which indicate said candidate character are accounted for in said running tally, and wherein said voting unit selects a recognized character from one of the candidate characters based on the composite attributes of the candidate characters and produces a combined data signal representing said recognized character. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for recognizing characters contained in a digital page image, said method comprising the steps of:
-
providing said digital page image to a plurality of character recognition (CR) devices; indicating in each of said CR devices an indicated character and generating an attribute associated with said indicated character for a particular character location in said digital page image; for each distinct indicated character, generating a running tally of attributes for those of said plurality of CR devices which indicate said distinct indicated character to arrive at a final tally, representing a composite attribute for said distinct indicated character, when attributes for all of said plurality of CR devices which indicate said distinct indicated character are accounted for in said running tally; selecting a selected character on a basis of said composite character attributes of said distinct indicated characters; and outputting said selected character as a recognized character. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A character recognition (CR) system for recognizing characters within a digital page image and for providing a confidence indication indicative of a degree of confidence in text recognized by said system, said system comprising:
-
an input means for inputting said digital page image into said CR system; recognition means for receiving said digital page image and for performing, for each defined location on said digital page image, a plurality of different recognition techniques, each of said different recognition techniques indicating a recognized text and assigning an associated confidence indication to said recognized text, said recognition means outputting said recognized text and said associated confidence indication produced by each of said different recognition techniques; and selecting means, coupled to receive said recognized text and said associated confidence indication produced by each of said different recognition techniques from said recognition means, for selecting a combined recognized text from a plurality of distinct recognized text on a basis of a combined confidence indication of each of said distinct recognized text, wherein a combined confidence indication of a distinct recognized text represents a final tally of a running tally of confidence indications associated with said distinct recognized text; and output means for outputting said combined recognized text as a text recognized by said system.
-
-
12. A character recognition (CR) system for recognizing characters within a digital page image, comprising:
-
a CR unit, including a plurality of CR devices receiving said digital page image and converting said digital page image into corresponding streams of character data, at least one of said streams of character data corresponding to one of said CR devices including positional information; a synchronization unit coupled to receive said streams of characters from said CR unit, said synchronization unit aligning character positions within said streams of data using at least said positional information and outputting synchronized streams of character data; and a voting unit coupled to receive said synchronized streams of character data for selecting an output indicated character at regions where, at a given frame of synchronization, at least two of said synchronized streams of character data indicate different recognized characters, wherein a running tally of how many CR units is associated with each different recognized character is kept and said output indicated character is selected based on a final tally of the running tally. - View Dependent Claims (13)
-
-
14. A character recognition (CR) system for recognizing characters within a digital page image, comprising:
-
an input interface connected to receive said digital page image; a plurality of CR devices, each connected to receive said digital page image from said input interface, each of said CR devices converting said digital page image into a stream of characters and CR attributes corresponding to each of said characters, each of said characters corresponding to a character position in said digital page image; a synchronization unit coupled to receive said stream of characters from said plurality CR devices, said synchronization unit combining an output of characters of said character streams so that characters which correspond to a same character position in said digital page image are output in synchronization; and a voting unit coupled to receive said synchronized output of characters of said character streams from said synchronization unit, wherein, for each frame of synchronization, said voting unit keeps a different running tally of CR attributes for each different character in said frame and outputs a character corresponding to a final tally of a highest value. - View Dependent Claims (15)
-
-
16. A character recognition (CR) system for recognizing characters within a digital page image, comprising:
-
(a) an input interface connected to receive said digital page image; (b) N CR devices, each CR device n (where n=1, 2, . . . , N) connected to receive said digital page image from said input interface, each of said N CR devices indicating a character at a portion of said digital page image; and (c) a voting unit coupled with each of said N CR devices, said voting unit assigning an attribute F(n,t) for each indicated character t as a function of the CR device n and the indicated character t, wherein, for each portion of said digital image, said voting unit (1) selects a character s from an alphanumeric table, (2) initializes a running tally variable T(s) associated with the selected character s, (3) selects a CR device m from among said N CR devices, (4) compares the character indicated by the selected CR device m against the selected character s and, if there is a match, increases the running tally variable T(s) by an amount proportional to the attribute F(m,s) associated with the character s indicated by the selected CR device m, (5) repeats steps (3) and (4) for a different CR device until all N CR devices are accounted for, (6) repeats steps (1) through (5) for a different character in the alphanumeric table until all characters in the alphanumeric table are accounted for, and (7) selects the character s having a highest running tally T(s) associated therewith as a recognized character. - View Dependent Claims (17, 18)
-
Specification