Sentence classification device and method
First Claim
Patent Images
1. A sentence classification device comprising:
- a processor;
a memory for storing a plurality of terms;
a term list having the plurality of terms each term comprising not less than one word;
a Document Term (DT) matrix generation module for generating a DT matrix two-dimensionally expressing a relationship between each document contained in a document set and said each term;
a DT matrix transformation module for generating a transformed DT matrix having respective clusters, each cluster having one or more blocks of associated documents, by transforming the DT matrix obtained by said DT matrix generation module on a basis of a DM decomposition method used in a graph theory to enable document classification without having to preselect cluster categories;
a classification generation module for generating classifications associated with the document set on a basis of a relationship between each cluster on the transformed DT matrix obtained by said DT matrix transformation module and said each document classified according to the clusters, wherein the classification generation module comprises a virtual representative document generation module for generating a virtual representative document, for each cluster on the transformed DT matrix, from a term of each document belonging to the cluster;
a large classification generation module for generating a large classification of documents from each document in a bottom-up manner by repeatedly performing, at each DT matrix transformation, said DM decomposition method used to hierarchically cluster documents by setting said DT matrix generated by said DT matrix generation module in an initial state, causing said virtual representative document generation module to generate a virtual representative document for each cluster on the transformed DT matrix generated from the DT matrix by said DT matrix transformation module, generating a new DT matrix used for next hierarchical clustering processing by adding a virtual representative document to the transformed DT matrix and deleting documents belonging to the cluster of the virtual representative document from the transformed DT matrix, and outputting, for said each cluster, information associated with the documents constituting the respective cluster as large classification data of one or more cluster categories;
a term list edition module for adding or deleting an arbitrary term with respect to the term list;
an index generation module for making said DT matrix generation module generate DT matrices by using term lists before and after edition by said term list edition module, and generating and outputting an index indicating validity of the edition from the DT matrices,a large classification label generation module for, if a virtual representative document is contained in a given cluster of the respective clusters obtained by the clustering processing, generating a label of the given cluster on which the virtual representative document is based from a term strongly connected to the virtual representative document subsequent to classification of the documents into the respective clusters,wherein said large classification generation module terminates repetition of the clustering processing when no cluster is obtained from the transformed DT matrix in the clustering processing.
1 Assignment
0 Petitions
Accused Products
Abstract
A DT matrix generation means (11) generates a DT matrix (11A) from each document (D) in a document set (21) and each term (T) in a term list (22). A DT matrix transformation means (12) generates a transformed DT matrix (11B) by performing DM decomposition of the DT matrix (11A). A document classification means (13) extracts and outputs, for each cluster appearing on the transformed DT matrix (11B), each document (D) belonging to the cluster as one classification (subset).
-
Citations
8 Claims
-
1. A sentence classification device comprising:
-
a processor; a memory for storing a plurality of terms; a term list having the plurality of terms each term comprising not less than one word; a Document Term (DT) matrix generation module for generating a DT matrix two-dimensionally expressing a relationship between each document contained in a document set and said each term; a DT matrix transformation module for generating a transformed DT matrix having respective clusters, each cluster having one or more blocks of associated documents, by transforming the DT matrix obtained by said DT matrix generation module on a basis of a DM decomposition method used in a graph theory to enable document classification without having to preselect cluster categories; a classification generation module for generating classifications associated with the document set on a basis of a relationship between each cluster on the transformed DT matrix obtained by said DT matrix transformation module and said each document classified according to the clusters, wherein the classification generation module comprises a virtual representative document generation module for generating a virtual representative document, for each cluster on the transformed DT matrix, from a term of each document belonging to the cluster; a large classification generation module for generating a large classification of documents from each document in a bottom-up manner by repeatedly performing, at each DT matrix transformation, said DM decomposition method used to hierarchically cluster documents by setting said DT matrix generated by said DT matrix generation module in an initial state, causing said virtual representative document generation module to generate a virtual representative document for each cluster on the transformed DT matrix generated from the DT matrix by said DT matrix transformation module, generating a new DT matrix used for next hierarchical clustering processing by adding a virtual representative document to the transformed DT matrix and deleting documents belonging to the cluster of the virtual representative document from the transformed DT matrix, and outputting, for said each cluster, information associated with the documents constituting the respective cluster as large classification data of one or more cluster categories; a term list edition module for adding or deleting an arbitrary term with respect to the term list; an index generation module for making said DT matrix generation module generate DT matrices by using term lists before and after edition by said term list edition module, and generating and outputting an index indicating validity of the edition from the DT matrices, a large classification label generation module for, if a virtual representative document is contained in a given cluster of the respective clusters obtained by the clustering processing, generating a label of the given cluster on which the virtual representative document is based from a term strongly connected to the virtual representative document subsequent to classification of the documents into the respective clusters, wherein said large classification generation module terminates repetition of the clustering processing when no cluster is obtained from the transformed DT matrix in the clustering processing. - View Dependent Claims (2, 3, 4)
-
-
5. A sentence classification method comprising:
-
generating, by using a computer, a Document Term (DT) matrix two-dimensionally expressing a relationship between each document contained in a document set and each term of a term list having a plurality of terms each comprising not less than one word; generating a transformed DT matrix having respective clusters, each cluster having, one or more blocks of associated documents, by transforming the DT matrix on a basis of a DM decomposition method used in a graph theory to enable document classification without having to preselect cluster categories; generating classifications associated with the document set on a basis of a relationship between each respective cluster on the transformed DT matrix and said each document classified according to the respective clusters, wherein the generating comprises a virtual representative document generation step of generating a virtual representative document, for each respective cluster on the transformed DT matrix, from a term of each document belonging to the respective cluster; generating a large classification of documents from each document in a bottom-up manner by repeatedly performing, at each DT matrix transformation, said DM decomposition method used to hierarchically cluster documents by setting said DT matrix generated in said DT matrix generation step in an initial state, generating a virtual representative document in a virtual representative document generation step for each respective cluster on the transformed DT matrix generated from the DT matrix in the DT matrix transformation step, the step of generating a new DT matrix used for next hierarchical clustering processing by adding the virtual representative document to the transformed DT matrix and deleting documents belonging to the cluster of the virtual representative document from the transformed DT matrix, and the step of outputting, for said each respective cluster, information associated with the documents constituting the respective cluster as large classification data of one or more cluster categories; and adding or deleting an arbitrary term with respect to the term list; and
the step of generating DT matrices by using term lists before and after edition, and generating and outputting an index indicating validity of the edition from the DT matrices, andin large classification label generation, if a virtual representative document is contained in a given cluster of the respective clusters obtained by the clustering processing, generating a label of the given cluster on which the virtual representative document is based from a term strongly connected to the virtual representative document subsequent to classification of the documents into the respective clusters, wherein in the large classification generation step, repetition of the clustering processing is terminated when no cluster is obtained from the transformed DT matrix in the clustering processing. - View Dependent Claims (6, 7, 8)
-
Specification