Document classification system, document classification method, and document classification program
First Claim
1. A document classification system comprising one or more processors configured to cause the document classification system to function as:
- an extraction circuitry that extracts a plurality of documents by sampling the plurality of documents from document information as target of classification;
a classification code receiving circuitry that receives one or more classification codes for each of the plurality of documents for classifying each of the plurality of documents, wherein a classification code “
HOT”
is assigned to a document having a high relevancy among the plurality of documents;
a selection circuitry that selects one or more keywords which are plotted above a straight line R_hot=R_all,wherein R_hot indicates a percentage of documents which include a keyword selected as the keyword related to the classification code “
HOT” and
to which the classification code “
HOT”
is assigned among all documents to which the classification code “
HOT”
is assigned, andwherein R_all indicates a percentage of documents which include the one or more keywords selected by the selection circuitry among the plurality of documents;
a learning circuitry that learns a weight of each keyword selected by the selection circuitry;
a database that records the one or more keywords which are selected in each of the documents to which the one or more classification codes are assigned, wherein the one or more keywords are correlated with the weight of the keyword learned by the learning circuitry,wherein the learning circuitry increases or decreases a number of keywords recorded in the database on the basis of the learning; and
a score calculation circuitry that calculates a score indicating the strength of a connection between an unclassified document to which the one or more classification codes are not assigned and the one or more classification codes, on the basis of the one or more keywords which are included in the unclassified document and the weight correlated with the one or more keywords in the database.
1 Assignment
0 Petitions
Accused Products
Abstract
A document classification system is provided. The document classification system analyzes digital document information which is collected to be submitted as evidence in a lawsuit and classifies the digital document information. The document classification system includes an extraction unit that extracts documents from the collected document information, a document display unit that displays an extracted document group, a classification code receiving unit that receives a classification code assigned to the displayed document group, a selection unit that classifies the extracted document group for each classification code, analyzes a keyword commonly appearing in the classified document group, and selects the keyword, a database that records the selected keyword, a search unit that searches for the keyword from the document information, a score calculation unit that calculates a score indicating connection between the classification code and the document, and an automatic classification unit that automatically assigns the classification code.
9 Citations
10 Claims
-
1. A document classification system comprising one or more processors configured to cause the document classification system to function as:
-
an extraction circuitry that extracts a plurality of documents by sampling the plurality of documents from document information as target of classification; a classification code receiving circuitry that receives one or more classification codes for each of the plurality of documents for classifying each of the plurality of documents, wherein a classification code “
HOT”
is assigned to a document having a high relevancy among the plurality of documents;a selection circuitry that selects one or more keywords which are plotted above a straight line R_hot=R_all, wherein R_hot indicates a percentage of documents which include a keyword selected as the keyword related to the classification code “
HOT” and
to which the classification code “
HOT”
is assigned among all documents to which the classification code “
HOT”
is assigned, andwherein R_all indicates a percentage of documents which include the one or more keywords selected by the selection circuitry among the plurality of documents; a learning circuitry that learns a weight of each keyword selected by the selection circuitry; a database that records the one or more keywords which are selected in each of the documents to which the one or more classification codes are assigned, wherein the one or more keywords are correlated with the weight of the keyword learned by the learning circuitry, wherein the learning circuitry increases or decreases a number of keywords recorded in the database on the basis of the learning; and a score calculation circuitry that calculates a score indicating the strength of a connection between an unclassified document to which the one or more classification codes are not assigned and the one or more classification codes, on the basis of the one or more keywords which are included in the unclassified document and the weight correlated with the one or more keywords in the database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A document classification method that is performed in a document classification system that includes one or more processors, the document classification method comprising:
-
extracting a plurality of documents by sampling the plurality of documents from document information as target of classification; receiving one or more classification codes for each of the plurality of documents for classifying each of the plurality of documents, wherein a classification code “
HOT”
is assigned to a document having a high relevancy among the plurality of documents;selecting one or more keywords which are plotted above a straight line R_hot=R_all, wherein R_hot indicates a percentage of documents which include a keyword selected as the keyword related to the classification code “
HOT” and
to which the classification code “
HOT”
is assigned among all documents to which the classification code “
HOT”
is assigned, andwherein R_all indicates a percentage of documents which include the one or more keywords selected by the selection circuitry among the plurality of documents; learning a weight of each keyword selected by the selection circuitry; recording, in a database, the one or more keywords which are selected in each of the documents to which the one or more classification codes are assigned, wherein the one or more keywords are correlated with the weight of the keyword learned, increasing or decreasing a number of keywords recorded in the database on the basis of the learning; and calculating a score indicating the strength of a connection between an unclassified document to which the one or more classification codes are not assigned and the one or more classification codes, on the basis of the one or more keywords which are included in the unclassified document and the weight correlated with the one or more keywords in the database.
-
-
10. A document classification program stored in a non-transitory computer-readable medium, which when executed by one or more processors included in a document classification system, causes the document classification system to perform a method comprising:
-
extracting a plurality of documents by sampling the plurality of documents from document information as target of classification; receiving one or more classification codes for each of the plurality of documents for classifying each of the plurality of documents, wherein a classification code “
HOT”
is assigned to a document having a high relevancy among the plurality of documents;selecting one or more keywords which are plotted above a straight line R_hot=R_all, wherein R_hot indicates a percentage of documents which include a keyword selected as the keyword related to the classification code “
HOT” and
to which the classification code “
HOT”
is assigned among all documents to which the classification code “
HOT”
is assigned, andwherein R_all indicates a percentage of documents which include the one or more keywords selected by the selection circuitry among the plurality of documents; learning a weight of each keyword selected by the selection circuitry; recording, in a database, the one or more keywords which are selected in each of the documents to which the one or more classification codes are assigned, wherein the one or more keywords are correlated with the weight of the keyword learned, increasing or decreasing a number of keywords recorded in the database on the basis of the learning; and calculating a score indicating the strength of a connection between an unclassified document to which the one or more classification codes are not assigned and the one or more classification codes, on the basis of the one or more keywords which are included in the unclassified document and the weight correlated with the one or more keywords in the database.
-
Specification