AUTOMATIC DOCUMENT CLASSIFICATION USING TEXT AND IMAGES
First Claim
1. A method for document classification comprising:
- analyzing textual and graphical properties of an electronic document;
generating a classification of the document based on the textual and graphical properties; and
storing the electronic document in a pre-existing document hierarchy based on the classification.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for automatic document classification using text and images. The present invention provides a method and apparatus for automatic document classification based on text and image. A new document is analyzed based on textual content as well as visual appearance. The new document is automatically stored in one or more mirror directories in which the new document would most likely be stored by the user of the device if the new document were placed manually. Determination of the most likely directories is based on an analysis of multiple documents stored by the user in various directories. The mirror directories are components of a mirror directory structure, which is a copy of a pre-existing directory structure, such as the user'"'"'s hard drive. By storing the new document automatically, the user is relieved of the duty of manually selecting a directory for the new document.
132 Citations
32 Claims
-
1. A method for document classification comprising:
-
analyzing textual and graphical properties of an electronic document;
generating a classification of the document based on the textual and graphical properties; and
storing the electronic document in a pre-existing document hierarchy based on the classification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 11, 12)
-
-
9. A software product including a machine-readable medium having stored thereon sequences of instructions, which, when executed by a processor, cause the processor to:
-
analyze textual and graphical properties of an electronic document;
generate a classification of the document based on the textual and graphical properties; and
store the electronic document in a pre-existing directory structure based on the classification. - View Dependent Claims (10)
-
-
13. A method for document classification comprising:
-
analyzing documents in a pre-existing document directory structure to determine an organization of the pre-existing document directory structure;
generating a mirror directory structure based on the pre-existing document directory structure; and
placing a document in the mirror directory structure based on the organization of the pre-existing document directory structure, results of textual analysis of the document, and results of graphical analysis of the document. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A computer-readable medium having stored thereon sequences of instructions which, when executed by a processor, cause the processor to:
-
analyze a pre-existing document directory structure to determine an organization of the pre-existing document directory structure;
generate a mirror directory structure based on the pre-existing document directory structure; and
place a document in the mirror directory structure based on the organization of the pre-existing document directory structure. - View Dependent Claims (20, 21, 22, 23)
-
-
24. An apparatus comprising:
-
means for analyzing a pre-existing document directory structure to determine an organization of the pre-existing document directory structure;
means for generating a mirror directory structure based on the pre-existing document directory structure; and
means for placing a document in the mirror directory structure based on the organization of the pre-existing document directory structure. - View Dependent Claims (25, 26, 27, 28)
-
-
29. A document processing system comprising:
-
a document scanning device;
a document storage device coupled to the document scanning device, wherein the document storage device is organized as a document directory structure having multiple directories, and further wherein the document storage device has a mirror directory structure having multiple directories organized based on the document directory structure; and
a processor coupled to the document scanning device and to the document storage device, wherein the processor analyzes a document scanned by the document scanning device to determine a directory in the document directory structure in which the document should be placed and stores the document in a corresponding directory in the mirror directory structure. - View Dependent Claims (30, 31, 32)
-
Specification