×

Sentence classification device and method

  • US 7,567,954 B2
  • Filed: 07/01/2004
  • Issued: 07/28/2009
  • Est. Priority Date: 07/01/2003
  • Status: Active Grant
First Claim
Patent Images

1. A sentence classification device comprising:

  • a processor;

    a memory for storing a plurality of terms;

    a term list having the plurality of terms each term comprising not less than one word;

    a Document Term (DT) matrix generation module for generating a DT matrix two-dimensionally expressing a relationship between each document contained in a document set and said each term;

    a DT matrix transformation module for generating a transformed DT matrix having respective clusters, each cluster having one or more blocks of associated documents, by transforming the DT matrix obtained by said DT matrix generation module on a basis of a DM decomposition method used in a graph theory to enable document classification without having to preselect cluster categories;

    a classification generation module for generating classifications associated with the document set on a basis of a relationship between each cluster on the transformed DT matrix obtained by said DT matrix transformation module and said each document classified according to the clusters, wherein the classification generation module comprises a virtual representative document generation module for generating a virtual representative document, for each cluster on the transformed DT matrix, from a term of each document belonging to the cluster;

    a large classification generation module for generating a large classification of documents from each document in a bottom-up manner by repeatedly performing, at each DT matrix transformation, said DM decomposition method used to hierarchically cluster documents by setting said DT matrix generated by said DT matrix generation module in an initial state, causing said virtual representative document generation module to generate a virtual representative document for each cluster on the transformed DT matrix generated from the DT matrix by said DT matrix transformation module, generating a new DT matrix used for next hierarchical clustering processing by adding a virtual representative document to the transformed DT matrix and deleting documents belonging to the cluster of the virtual representative document from the transformed DT matrix, and outputting, for said each cluster, information associated with the documents constituting the respective cluster as large classification data of one or more cluster categories;

    a term list edition module for adding or deleting an arbitrary term with respect to the term list;

    an index generation module for making said DT matrix generation module generate DT matrices by using term lists before and after edition by said term list edition module, and generating and outputting an index indicating validity of the edition from the DT matrices,a large classification label generation module for, if a virtual representative document is contained in a given cluster of the respective clusters obtained by the clustering processing, generating a label of the given cluster on which the virtual representative document is based from a term strongly connected to the virtual representative document subsequent to classification of the documents into the respective clusters,wherein said large classification generation module terminates repetition of the clustering processing when no cluster is obtained from the transformed DT matrix in the clustering processing.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×