Method of recognizing and indexing documents
First Claim
Patent Images
1. A method of recognizing and indexing documents in a system having a scanner connected to a computer, the method comprising:
- scanning a document;
designating, by a user, an arbitrary point P in at least one box of the scanned document using a pointing device or member of the computer, wherein no extracting or analyzing process takes place previous to the user'"'"'s designation;
searching for and identifying said box by applying a shape search algorithm over a determined search zone surrounding said point P previously designated by the user, wherein if the determined search zone does not include the entire box, using at least one additional search zone of increasing size until the entire box is within the search zone;
recognizing by OCR the characters in said identified box of the scanned document;
storing the recognized characters in a first database connected to the computer to enable documents scanned in this way to be indexed; and
storing, in a second database connected to the computer, characterization data of said box of the scanned document, such that another box subsequently can be identified automatically without any point P within said another box being designated, for next documents of a same type.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of recognizing and indexing documents, using a scanner connected to a computer, the method including scanning the documents, then using a pointing device or member of the computer to designate an arbitrary point P in at least one box of the documents, and finally recognizing by OCR the characters in the box so as to store them in a first database connected to the computer to enable documents scanned in this way to be indexed. The designation step comprises a step of searching for and identifying the box of the document containing the point P designated by the user.
34 Citations
17 Claims
-
1. A method of recognizing and indexing documents in a system having a scanner connected to a computer, the method comprising:
-
scanning a document; designating, by a user, an arbitrary point P in at least one box of the scanned document using a pointing device or member of the computer, wherein no extracting or analyzing process takes place previous to the user'"'"'s designation; searching for and identifying said box by applying a shape search algorithm over a determined search zone surrounding said point P previously designated by the user, wherein if the determined search zone does not include the entire box, using at least one additional search zone of increasing size until the entire box is within the search zone; recognizing by OCR the characters in said identified box of the scanned document; storing the recognized characters in a first database connected to the computer to enable documents scanned in this way to be indexed; and storing, in a second database connected to the computer, characterization data of said box of the scanned document, such that another box subsequently can be identified automatically without any point P within said another box being designated, for next documents of a same type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus for recognizing and indexing documents, the apparatus comprising:
-
a scanner for scanning a document and delivering an image of the scanned document; a computer connected to the scanner to receive said scanned image; a first database connected to said computer for storing said scanned image; first software for using a pointing member to designate, by a user, an arbitrary point P in at least one box of the scanned image, wherein no extracting or analyzing process takes place previous to the user'"'"'s designation, for searching for and identifing the box containing said point P designated by the user, for recognizing by OCR the characters in said box of the scanned image, and for storing the recognized characters so as to enable images scanned in this way to be indexed, wherein the searching for and identifying said box is performed by applying a shape search algorithm over a determined search zone surrounding said point P previously designated by the user, wherein if the determined search zone does not include the entire box, using at least one additional search zone of increasing size until the entire box is within the search zone; a second database connected to the computer to store characterization data of said box of the scanned image, such that another box subsequently can be identified automatically by said first software without any point P within said another box being designated, for next documents of a same type. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of recognizing and indexing documents in a system having a scanner connected to a computer, the method comprising:
-
scanning a document; manually designating, by a user an arbitrary point P in a predetermined area of the scanned document, if a type of the scanned document is not known, wherein no extracting or analyzing process takes place previous to the user'"'"'s designation; searching for and identifying a box around the arbitrary point P of the scanned document by applying a shape search algorithm over a determined search zone surrounding the arbitrary point P designated by the user, wherein if the determined search zone does not include the entire box, using at least one additional search zone of increasing size until the entire box is within the search zone; storing, in a database connected to the computer, characterization data of the identified box of the scanned document, such that boxes in next documents of a same type can be identified automatically without designation of an arbitrary point P on the next documents; recognizing characters in the identified box of the scanned document; and storing the recognized characters to index the scanned document.
-
Specification