DOCUMENT PROCESSOR, DOCUMENT CLASSIFICATION DEVICE, DOCUMENT PROCESSING METHOD, DOCUMENT CLASSIFICATION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM FOR RECORDING PROGRAMS FOR EXECUTING THE METHODS ON A COMPUTER
First Claim
Patent Images
1. A document classification device which classifies documents based on contents thereof comprising:
- an input unit which inputs a document data;
a language analyzer unit which analyzes document data input by said input unit and obtains language analysis information;
a vector creation unit which creates document characteristic vectors for the document data based on the language analysis information obtained by said language analyzer unit;
a classification unit which classifies documents based on the degree of similarity between document characteristic vectors created by said vector creation unit, and creates clusters of documents;
a cluster characteristics calculation unit which calculates cluster characteristics, which are characteristics of clusters of documents created by said classification unit;
a display unit which displays the cluster characteristics calculated by said cluster characteristics calculation unit;
a cluster selection specification unit which selects predetermined clusters from cluster of documents created by said classification unit;
a classification category memory which stores cluster characteristics, calculated by said cluster characteristics calculation unit, as constituent elements of classification categories;
a document characteristic vector memory which stores document characteristic vectors created by said vector creation unit; and
a vector correction unit which corrects document characteristic vectors stored in said document characteristic vector memory so that document characteristic vectors of documents belonging to clusters selected by said cluster selection unit are deleted.
1 Assignment
0 Petitions
Accused Products
Abstract
In the document processor, a document memory which stores input document data; a selector which selects all or part of document data stored in the document memory; a characteristics extractor which extracts data relating to characteristics of letter rows from all or part of the document data selected by the selector; a work processor which work-processes all or part of the document data based on the data relating to characteristics of letter rows extracted by the characteristics extractor; and an output section which outputs all or part of the document data work-processed by the work processor are provided.
-
Citations
11 Claims
-
1. A document classification device which classifies documents based on contents thereof comprising:
-
an input unit which inputs a document data;
a language analyzer unit which analyzes document data input by said input unit and obtains language analysis information;
a vector creation unit which creates document characteristic vectors for the document data based on the language analysis information obtained by said language analyzer unit;
a classification unit which classifies documents based on the degree of similarity between document characteristic vectors created by said vector creation unit, and creates clusters of documents;
a cluster characteristics calculation unit which calculates cluster characteristics, which are characteristics of clusters of documents created by said classification unit;
a display unit which displays the cluster characteristics calculated by said cluster characteristics calculation unit;
a cluster selection specification unit which selects predetermined clusters from cluster of documents created by said classification unit;
a classification category memory which stores cluster characteristics, calculated by said cluster characteristics calculation unit, as constituent elements of classification categories;
a document characteristic vector memory which stores document characteristic vectors created by said vector creation unit; and
a vector correction unit which corrects document characteristic vectors stored in said document characteristic vector memory so that document characteristic vectors of documents belonging to clusters selected by said cluster selection unit are deleted. - View Dependent Claims (2, 3, 4, 5)
wherein said classification unit classifies documents based on the document characteristic vectors corrected by said vector correction unit. -
3. The document classification device according to claim 1, further comprising
a document expression space correction unit which corrects document expression space when determining the degree of similarity between document characteristic vectors stored in said document characteristic vectors memory, based on a characteristics amount calculated from clusters selected by said cluster selection unit, wherein said classification unit classifies the documents based on the degree of similarity between document characteristic vectors created by said vector creation unit, using the document expression space corrected by said document expression space correction unit. -
4. The document classification device according to claim 1, further comprising a selection information appending unit which appends selection information showing the fact of selection when all or part of the documents belonging to a cluster of documents created by said classification unit have been selected,
wherein said display unit displays the cluster characteristics, and the selection information appended by said selection information appending unit. -
5. The document classification device according to claim 1, wherein said classification category memory stores cluster characteristics and/or information created by an operator, in addition to all or part of the documents belonging to a cluster of documents selected by said selection specification unit, as constituent elements of classification categories.
-
-
6. A document classification method of classifying documents based on contents thereof, comprising the steps of:
-
inputting a document data;
language-analyzing document data input in the step of inputting and obtaining language analysis information;
creating document characteristic vectors for the document data based on the language analysis information obtained in the step of language-analyzing;
classifying documents based on the degree of similarity between document characteristic vectors created in the step of creating vectors, and creating clusters of documents;
calculating cluster characteristics, which are characteristics of clusters of documents created in the step of classifying;
displaying the cluster characteristics calculated in the step of calculating cluster characteristics;
selecting predetermined clusters from cluster of documents created in the step of classifying;
storing cluster characteristics, calculated in the step of calculating cluster characteristics, as constituent elements of classification categories;
storing document characteristics vector created in the step of creating document characteristic vectors; and
correcting document characteristic vectors stored in the step of storing document characteristic vectors, so that document characteristic vectors of documents belonging to clusters selected by the step of selecting clusters are deleted. - View Dependent Claims (7, 8, 9, 10)
wherein the step of classifying comprising classifying documents based on the document characteristic vectors corrected by the step of correcting vectors. -
8. The document classification method according to claim 6, further comprising a step of correcting document expression space when determining the degree of similarity between document characteristic vectors stored in the step of storing document characteristic vectors, based on a characteristics amount calculated from clusters selected in the step of selecting clusters,
wherein the step of classifying comprises classifying documents based on the degree of similarity between document characteristic vectors created in the step of creating vectors, using the document expression space corrected in the step of correcting the document expression space. -
9. The document classification method according to claim 6, further comprising the steps of appending selection information showing the fact of selection when all or part of the documents belonging to a cluster of documents created in the step of classifying have been selected,
wherein the step of displaying comprising displaying the cluster characteristics, and the selection information appended in the step of appending selection information. -
10. The document classification device according to claim 6, wherein the step of creating classification categories comprises creating cluster characteristics and/or information created by an operator, in addition to all or part of the documents belonging to a cluster of documents selected in the step of specifying selection, as constituent elements of classification categories.
-
-
11. A computer-readable recording medium in which is stored programs for executing a document classification method, which document classification method comprising the steps of:
-
inputting a document data;
language-analyzing document data input in the step of inputting and obtaining language analysis information;
creating document characteristic vectors for the document data based on the language analysis information obtained in the step of language-analyzing;
classifying documents based on the degree of similarity between document characteristic vectors created in the step of creating vectors, and creating clusters of documents;
calculating cluster characteristics, which are characteristics of clusters of documents created in the step of classifying;
displaying the cluster characteristics calculated in the step of calculating cluster characteristics;
selecting predetermined clusters from cluster of documents created in the step of classifying;
storing cluster characteristics, calculated in the step of calculating cluster characteristics, as constituent elements of classification categories;
storing document characteristics vector created in the step of creating document characteristic vectors; and
correcting document characteristic vectors stored in the step of storing document characteristic vectors, so that document characteristic vectors of documents belonging to clusters selected by the step of selecting clusters are deleted.
-
Specification