Character recognition system and method
First Claim
1. A method for translating a written document into a computer readable document comprising:
- providing a pixel representation of the written document;
identifying at least one field into said pixel representation of the written document;
segmenting each said at least one field, yielding at least one segmented symbol;
applying a character recognition method on each segmented symbol; and
assigning a computer-readable code to each recognized character resulting from said character recognition method.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for translating a written document into a computer readable document by recognizing the character written on the document aim at recognizing typed or printed, especially hand-printed or handwritten characters, in the various fields of a form. Providing a pixel representation of the written document, the method allows translating a written document into a computer readable document by i) identifying at least one field into the pixel representation of the document; ii) segmenting each field so as to yield at least one segmented symbol; iii) applying a character recognition method on each segmented symbol; and iii) assigning a computer-readable code to each recognized character resulting from the character recognition method. The character recognition method includes doing a vector quantization on each segmented symbol, and doing a vector classification using a vector base. A learning base is also created based on the optimal elliptic separation method. System and method according to the present invention allow to achieve a substitution rate of near zero.
-
Citations
35 Claims
-
1. A method for translating a written document into a computer readable document comprising:
-
providing a pixel representation of the written document;
identifying at least one field into said pixel representation of the written document;
segmenting each said at least one field, yielding at least one segmented symbol;
applying a character recognition method on each segmented symbol; and
assigning a computer-readable code to each recognized character resulting from said character recognition method. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for recognizing a character corresponding to a written symbol, comprising:
-
providing a pixel representation of the written symbol;
segmenting said pixel representation, yielding a segmented symbol;
doing a vector quantization on said segmented symbol, yielding a vector representation of said symbol;
for possible class(i), i ranging from 1 to N, N being the number of different possible classes;
providing a vector representation(i) for each class(i);
computing a similarity score(i) using said vector representation (i) of said symbol and said vector representation for class(i); and
comparing said similarity score(i) to a threshold(i); and
if only one of said similarity score(x) is superior than the corresponding threshold(x), x ranging form 1 to N; and
assigning to said written symbol a computer-readable code corresponding to said class(x). - View Dependent Claims (23, 24, 25, 26)
-
-
27. A method for creating a vector base for a character recognition method comprising:
for each of a plurality of characters(i), i ranging from 1 to N, N being the number of characters;
providing a pixel representation (i) doing a vector quantization on each pixel representation(i), yielding a vector representation (i) for each pixel representation (i);
computing a similarity score (x) for each of a plurality of predetermined classes, x ranging from 1 to M, M being the number of predetermined classes, by comparing said vector representation (i) of each pixel representation to a provided vector quantization (x) corresponding to said each of a plurality of predetermined class;
a) if, for one of said plurality of predetermined classes (x), said similarity score(x) is superior to a predetermined threshold(x), said character (i) being considered already known;
b) if not, verifying if said character (i) belongs to one of said classes (x);
i. if no, said character (i) is rejected;
ii. if yes, said character (i) is associated to the corresponding class (x).
-
28. A character recognition learning method comprising:
-
providing a database of recognized characters;
each recognized characters belonging to a class and being represented by a quantization vector;
the number of different classes being C;
for each recognized character (i) in said database, measuring a distance(i) between a first quantization vector representing said each recognized character(i) and a second quantization vector representing a character from another class;
said second quantization vector having the shortest distance(i) with said first quantization vector among all quantization vectors representing characters from a class different than the class to which said each character (i) belongs; and
for each class(j), j ranging from 1 to C;
for a predetermined number of recognized character(k) member of class(j);
defining a same class sphere(k) comprising only quantization vectors which are members of class(j) and having a distance with said quantization vectors(k) less than distance(k); and
determining a number(k) of quantization vectors representing a character from class(j) and being part of same class sphere(k);
for each same class sphere(k), from said same class sphere having the largest number(k) to said same class sphere having the smallest number(k), applying an elliptic deformation until members of other classes are reached, yielding an optimized quantization vector for class(k).
-
-
29. A system for translating a written document into a computer-readable document:
-
a document digitizer for creating a pixel representation of the document;
a controller coupled to said digitizer for;
receiving said pixel representation of the document;
identifying at least one field in said pixel representation of the document;
segmenting each said at least one field, yielding at least one segmented symbol for each said at least one field;
applying a character recognition method on each segmented symbol; and
assigning a computer-readable code to each recognized character resulting from said character recognition method;
an output device coupled to said controller for displaying segmented symbols, from said at least one segmented symbol, unrecognized by said character recognition method; and
at least one input device coupled to said controller for entering a computer-readable code of humanly recognized character among displayed segmented symbols. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. A system for translating a written document into a computer readable document comprising:
-
means for providing a pixel representation of the written document;
means for identifying at least one field into said pixel representation of the document;
means for segmenting each said at least one field, yielding at least one segmented symbol;
means for applying a character recognition method on each segmented symbol; and
means for assigning a computer-readable code to each recognized character resulting from said character recognition method.
-
Specification