Character recognition with document orientation determination
First Claim
1. A computer implemented method for determining the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters, the method comprising:
- receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner;
identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation;
associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors;
producing a second set of candidate character codes that correspond to characters from the page according to a second orientation;
associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors;
determining the number of confidence factor values in the first set of confidence factors that exceed a predetermined value;
determining the number of confidence factor values in the second set of confidence factors that exceed the predetermined value; and
determining that the correct page orientation is the first orientation when the number of confidence factors in the first set of confidence factors that exceeds the predetermined value is higher than the number of confidence factors in the second set of confidence factors that exceeds the predetermined value.
1 Assignment
0 Petitions
Accused Products
Abstract
The correct orientation for a document scanned by an OCR system is determined from the confidence factors associated with multiple character images identified in the document. One disclosed orientation determination module (258) for determining the correct page orientation includes a confidence factor values buffer (402), a comparison module (404), a reference value buffer (406), a sort module (408), and a decision module (410). The orientation determination module (258) obtains arrays of confidence factor values that respectively correspond to first and second page orientations. The confidence factor values buffer (402) stores and indexes the confidence factor values, the sort module (408) sorts the values, and the reference value buffer (406) stores comparison information such as threshold levels. The comparison module (404) compares the first and second arrays of confidence factor values, such as by accessing selected values in the arrays, averaging such selected values, and determining whether and how many values exceed threshold levels. The decision module (410) determines the correct, or proper, document orientation.
-
Citations
14 Claims
-
1. A computer implemented method for determining the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters, the method comprising:
-
receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation; associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors; producing a second set of candidate character codes that correspond to characters from the page according to a second orientation; associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; determining the number of confidence factor values in the first set of confidence factors that exceed a predetermined value; determining the number of confidence factor values in the second set of confidence factors that exceed the predetermined value; and determining that the correct page orientation is the first orientation when the number of confidence factors in the first set of confidence factors that exceeds the predetermined value is higher than the number of confidence factors in the second set of confidence factors that exceeds the predetermined value. - View Dependent Claims (4, 5)
-
-
2. A computer implemented method for determining the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters, the method comprising:
-
receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation; associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors; producing a second set of candidate character codes that correspond to characters from the page according to a second orientation; associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; determining the average of the confidence factor values in the first set of confidence factors; determining the average of the confidence factor values in the second set of confidence factors; and determining that the correct page orientation is the second orientation when the average of the confidence factor values in the second set of confidence factors exceeds the average of the confidence factor values in the first set of confidence factor values.
-
-
3. A computer implemented method for determining the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters, the method comprising:
-
receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation; associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors; producing a second set of candidate character codes that correspond to characters from the page according to a second orientation; associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; obtaining a first subset of values, selected from the first set of confidence factor values, and determining the average value in the first subset; obtaining a second subset of values, selected from the second set of confidence factor values, and detennining the average value in the second subset; and determining that the correct page orientation is the second orientation when the average value in the second subset exceeds the average value in the first subset.
-
-
6. A computer readable medium containing a computer program that determines the correct orientation of scanned page of alphanumeric characters having a plurality of alphanumeric characters and includes routines for:
-
receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation; associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors; producing a second set of candidate character codes that correspond to characters from the page according to a second orientation; associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; determining the number of confidence factor values in the first set of confidence factors that exceed a predetermined value; determining the number of confidence factor values in the second set of confidence factors that exceed the predetermined value; and determining that the correct page orientation is the first orientation when the number of confidence factors in the first set of confidence factors that exceeds the predetermined value is higher than the number of confidence factors in the second set of confidence factors that exceeds the predetermined value. - View Dependent Claims (9)
-
-
7. A computer readable medium containing a computer program that determines the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters and includes routines for:
-
receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; identifying a first set of candidate character codes that correspond to symbols from the page according to the first orientation; associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors; producing a second set of candidate character codes that correspond to characters from the page according to a second orientation; associating a confidence factor with each candidate symbol code from the second set of candidate symbol codes to produce a second set of confidence factors; determining the average of the confidence factor values in the first set of confidence factors; determining the average of the confidence factor values in the second set of confidence factors; and determining that the correct page orientation is the second orientation when the average of the confidence factor values in the second set of confidence factors exceeds the average of the confidence factor values in the first set of confidence factor values.
-
-
8. A computer readable medium containing a computer program that determines the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters and includes routines for:
-
receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation; associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors; producing a second set of candidate character codes that correspond to characters from the page according to a second orientation; associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; obtaining a first subset of values, selected from the first set of confidence factor values, and determining the average value in the first subset; obtaining a second subset of values, selected from the second set of confidence factor values, and determining the average value in the second subset; and determining that the correct page orientation is the second orientation when the average value in the second subset exceeds the average value in the first subset.
-
-
10. A character recognition apparatus for determining the correct orientation of a scanned page of alphanumeric characteristics having a plurality of alphanumeric characters, the apparatus comprising:
-
a pre-processing module, for receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; a character classifying module, coupled to the pre-processing module, for identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation, associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors, producing a second set of candidate character codes that correspond to characters from the page according to a second orientation, and associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; and an orientation determination module coupled to the character classifying module, the orientation determination module, determining the number of confidence factor values in the first set of confidence factors that exceed a predetermined value; determining the number of confidence factor values in the second set of confidence factors that exceed the predetermined value; and determining that the correct page orientation is the first orientation when the number of confidence factors in the first set of confidence factors that exceeds the predetermined value is higher than the number of confidence factors in the second set of confidence factors that exceeds the predetermined value. - View Dependent Claims (13, 14)
-
-
11. A character recognition apparatus for determining the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters, the apparatus comprising:
-
a pre-processing module, for receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; a character classifying module, coupled to the pre-processing module, for identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation, associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors, producing a second set of candidate character codes that correspond to characters from the page according to a second orientation, and associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; and an orientation determination module coupled to the character classifying module, the orientation determination module, determining the average of the confidence factor values in the first set of confidence factors; determining the average of the confidence factor values in the second set of confidence factors; and determining that the correct page orientation is the second orientation when the average of the confidence factor values in the second set of confidence factors exceeds the average of the confidence factor values in the first set of confidence factor values.
-
-
12. A character recognition apparatus for determining the correct orientation of a scanned page of alphanumeric characters having a plurality of alphanumeric characters, the apparatus comprising:
-
a pre-processing module, for receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; a character classifying module, coupled to the pre-processing module, for identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation, associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors, producing a second set of candidate character codes that correspond to characters from the page according to a second orientation, and associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; and an orientation determination module coupled to the character classifying module, the orientation determination module, obtaining a first subset of values, selected from the first set of confidence factor values, and determining the average value in the first subset; obtaining a second subset of values, selected from the second set of confidence factor values, and determining the average value in the second subset; and determining that the correct page orientation is the second orientation when the average value in the second subset exceeds the average value in the first subset.
-
Specification