Character recognition system and method
First Claim
1. A method for translating a written document into a computer-readable document, said method comprising:
- providing a pixel representation of the written document;
providing a vector base comprising a plurality of classes, each said class corresponding to a computer-readable code and comprising a predetermined vector quantization of said corresponding computer-readable code;
identifying a field into said pixel representation of the written document;
segmenting said field, thereby yielding a segmented symbol;
performing a vector quantization on said segmented symbol;
comparing said vector quantization of said segmented symbol with said predetermined vector quantization of each said class, thereby assigning a similarity score between said segmented symbol and each said class; and
determining if only one of said similarity scores exceeds a predetermined threshold;
wherein when only one said similarity score exceeds said predetermined threshold, assigning to said segmented symbol a said computer-readable code corresponding to a said class for which said only one similarity score exceeds said predetermined threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for translating a written document into a computer readable document by recognizing the character written on the document aim at recognizing typed or printed, especially hand-printed or handwritten characters, in the various fields of a form. Providing a pixel representation of the written document, the method allows translating a written document into a computer readable document by i) identifying at least one field into the pixel representation of the document; ii) segmenting each field so as to yield at least one segmented symbol; iii) applying a character recognition method on each segmented symbol; and iii) assigning a computer-readable code to each recognized character resulting from the character recognition method. The character recognition method includes doing a vector quantization on each segmented symbol, and doing a vector classification using a vector base. A learning base is also created based on the optimal elliptic separation method. System and method according to the present invention allow to achieve a substitution rate of near zero.
145 Citations
29 Claims
-
1. A method for translating a written document into a computer-readable document, said method comprising:
-
providing a pixel representation of the written document; providing a vector base comprising a plurality of classes, each said class corresponding to a computer-readable code and comprising a predetermined vector quantization of said corresponding computer-readable code; identifying a field into said pixel representation of the written document; segmenting said field, thereby yielding a segmented symbol; performing a vector quantization on said segmented symbol; comparing said vector quantization of said segmented symbol with said predetermined vector quantization of each said class, thereby assigning a similarity score between said segmented symbol and each said class; and determining if only one of said similarity scores exceeds a predetermined threshold; wherein when only one said similarity score exceeds said predetermined threshold, assigning to said segmented symbol a said computer-readable code corresponding to a said class for which said only one similarity score exceeds said predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26, 27, 28)
-
-
14. A method for recognizing a character corresponding to a written symbol, said method comprising:
-
providing a pixel representation of the written symbol; segmenting said pixel representation, yielding a segmented symbol; doing a vector quantization on said segmented symbol, yielding a vector representation of said written symbol; for possible class(i), i ranging from 1 to N, N being the number of different possible classes; providing a vector representation(i) for each class(i); computing a similarity score(i) using said vector representation (i) of said symbol and said vector representation for class(i); and comparing said similarity score(i) to a threshold(i); and if only one of said similarity score(x) is superior than the corresponding threshold(x), x ranging form 1 to N; assigning to said written symbol a computer-readable code corresponding to said class(x). - View Dependent Claims (15, 16, 17, 18)
-
-
19. A system for translating a written document into a computer-readable document, said system comprising:
-
a document digitizer for creating a pixel representation of the document; a controller coupled to said digitizer for; receiving said pixel representation of the written document;
identifying a field in said pixel representation of the written document;segmenting said field, thereby yielding a segmented symbol; performing a vector quantization on said segmented symbol; comparing said vector quantization of said segmented symbol with said predetermined vector quantization of each said class, thereby assigning a similarity score between said segmented symbol and each said class; and determining if only one of said similarity scores exceeds a predetermined threshold; wherein, when only one said similarity score exceeds said predetermined threshold, assigning to said segmented symbol a said computer-readable code corresponding to a said class for which said only one similarity score exceeds said predetermined threshold; an output device coupled to said controller for displaying said segmented symbol; and at least one input device coupled to said controller for entering a computer-readable code of a humanly recognized character among displayed segmented symbols. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A system for translating a written document into a computer readable document, said system comprising:
-
means for providing a pixel representation of the written document; means for providing a vector base comprising a plurality of classes, each said class corresponding to a computer-readable code and comprising a predetermined vector quantization of said corresponding computer-readable code; means for identifying a field into said pixel representation of the written document; means for segmenting said field, thereby yielding a segmented symbol; means for performing a vector guantization on said segmented symbol; means for comparing said vector quantization of said segmented symbol with said predetermined vector quantization of each said class, thereby assigning a similarity score between said segmented symbol and each said class; and means for determining if only one of said similarity scores exceeds a predetermined threshold; wherein when only one said similarity score exceeds said predetermined threshold, assigning to said segmented symbol a said computer-readable code corresponding to a said class for which said only one similarity score exceeds said predetermined threshold.
-
-
29. A method as recited in 28, wherein said further inspection step includes at least one of (a) performing a human inspection step, (b) applying a predetermined field validity rule, (c) verifying in a stored database, (d)verifying in a thesaurus, and (e) applying an Intelligence Character Recognition method on said segmented symbol and a neighbour character segmented symbol.
Specification