Automatic document classification using text and images
First Claim
1. A method for document classification comprising:
- using a first directory structure mirroring a second directory structure used by a user for storing documents;
analyzing content of the documents within the second directory structure to determine a plurality of document classes within the second directory structure, the plurality of document classes indicating a user approach to placing documents in the second directory structure;
determining a document classification profile associated with the first directory structure based on the plurality of document classes;
analyzing content of a previously unclassified electronic document to determine a textual profile and a graphical profile of the electronic document;
generating a classification of the document based on the textual profile and the graphical profile; and
storing the electronic document in one or more directories within the first directory structure based on the classification of the document and the document classification profile associated with the first directory structure, to resemble the user approach to placing the documents in the second directory structure.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for automatic document classification using text and images. The present invention provides a method and apparatus for automatic document classification based on text and image. A new document is analyzed based on textual content as well as visual appearance. The new document is automatically stored in one or more mirror directories in which the new document would most likely be stored by the user of the device if the new document were placed manually. Determination of the most likely directories is based on an analysis of multiple documents stored by the user in various directories. The mirror directories are components of a mirror directory structure, which is a copy of a pre-existing directory structure, such as the user'"'"'s hard drive. By storing the new document automatically, the user is relieved of the duty of manually selecting a directory for the new document.
-
Citations
31 Claims
-
1. A method for document classification comprising:
-
using a first directory structure mirroring a second directory structure used by a user for storing documents;
analyzing content of the documents within the second directory structure to determine a plurality of document classes within the second directory structure, the plurality of document classes indicating a user approach to placing documents in the second directory structure;
determining a document classification profile associated with the first directory structure based on the plurality of document classes;
analyzing content of a previously unclassified electronic document to determine a textual profile and a graphical profile of the electronic document;
generating a classification of the document based on the textual profile and the graphical profile; and
storing the electronic document in one or more directories within the first directory structure based on the classification of the document and the document classification profile associated with the first directory structure, to resemble the user approach to placing the documents in the second directory structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A software product including a machine-readable medium having stored thereon sequences of instructions, which, when executed by a processor, cause the processor to:
-
use a first directory structure mirroring a second directory structure used by a user for storing documents;
analyze content of the documents within the second directory structure to determine a plurality of document classes within the second directory structure, the plurality of document classes indicating a user approach to placing documents in the second directory structure;
determine a document classification profile associated with the first directory structure based on the plurality of document classes;
analyze content of a previously unclassified electronic document to determine a textual profile and a graphical profile of the electronic document;
generate a classification of the document based on the textual profile and the graphical profile; and
store the electronic document in one or more directories within the first directory structure based on the classification of the document and the document classification profile associated with the first directory structure, to resemble the user approach to placing the documents in the second directory structure. - View Dependent Claims (10, 11, 12)
-
-
13. A method for document classification comprising:
-
analyzing content of documents within a pre-existing directory structure to determine a plurality of document classes within the pre-existing directory structure, the plurality of document classes indicating a user approach to placing documents in the pre-existing directory structure;
determining a document classification profile of the pre-existing directory structure based on the plurality of document classes;
generating a mirror directory structure based on the pre-existing document directory structure;
receiving a previously unclassified electronic document;
analyzing content of the electronic document to determine a textual profile and a graphical profile of the electronic document; and
placing the electronic document at a certain storage location in the mirror directory structure based on the document classification profile of the pre-existing document directory structure, the textual profile of the document, and the graphical profile of the document, to resemble the user approach to placing the documents in the pre-existing directory structure. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A computer-readable medium having stored thereon sequences of instructions which, when executed by a processor, cause the processor to:
-
analyze content of documents within a pre-existing directory structure to determine a plurality of document classes within the pre-existing directory structure, the plurality of document classes indicating a user approach to placing documents in the pre-existing directory structure;
determine a document classification profile of the preexisting directory structure based on the plurality of document classes;
generate a mirror directory structure based on the pre-existing document directory structure;
receive a previously unclassified electronic document;
analyze content of the electronic document to determine a textual profile and a graphical profile of the electronic document; and
place the electronic document at a certain storage location in the mirror directory structure based on the document classification profile of the pre-existing document directory structure, the textual profile of the document, and the graphical profile of the document, to resemble the user approach to placing the documents in the pre-existing directory structure. - View Dependent Claims (20, 21, 22, 23)
-
-
24. An apparatus comprising:
-
means for analyzing content of documents within a pre-existing directory structure to determine a plurality of document classes within the pre-existing directory structure, the plurality of document classes indicating a user approach to placing documents in the pre-existing directory structure;
means for determining a document classification profile of the pre-existing directory structure based on the plurality of document classes;
means for generating a mirror directory structure based on the pre-existing document directory structure;
means for receiving a previously unclassified electronic document;
means for analyzing content of the electronic document to determine a textual profile and a graphical profile of the electronic document; and
means for placing the electronic document at a certain storage location in the mirror directory structure based on the document classification profile of the pre-existing document directory structure, the textual profile of the document, and the graphical profile of the document, to resemble the user approach to placing the documents in the pre-existing directory structure. - View Dependent Claims (25, 26, 27, 28)
-
-
29. A document processing system comprising:
-
a document scanning device;
a document storage device coupled to the document scanning device, wherein the document storage device has a pre-existing document directory structure and a mirror document directory structure organized based on the pre-existing document directory structure; and
a processor coupled to the document scanning device and to the document storage device, wherein the processor is to analyze content of documents within the pre-existing document directory structure to determine a plurality of document classes in the pre-existing document directory structure, the plurality of document classes indicating a user approach to placing documents in the pre-existing directory structure, to determine a document classification profile of the pre-existing document directory structure based on the plurality of document classes, to analyze content of a document scanned by the document scanning device, to determine which directory in the mirror document directory structure the scanned document is to be placed based on the analysis of the content of the scanned document and the document classification profile of the pre-existing document directory structure, and to store the scanned document in the determined directory in the mirror document directory structure to resemble the user approach to placing the documents in the pre-existing directory structure. - View Dependent Claims (30, 31)
-
Specification