CLASSIFICATION RULE GENERATION DEVICE, CLASSIFICATION RULE GENERATION METHOD, CLASSIFICATION RULE GENERATION PROGRAM, AND RECORDING MEDIUM
First Claim
1. A classification rule generation device comprising:
- an input unit that inputs a document as a sample target document;
a storage unit that stores extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of classification categories, the partial text being extracted from the sample target document according to the classification categories, the extraction conditions being set for each of the classification categories;
a matching unit that matches the sample target document input by the input unit against the extraction conditions stored in the storage unit;
an extraction unit that attempts to perform partial text extraction to extract the partial text from the sample target document according to the classification categories, based on a result of matching by the matching unit; and
a learning unit that, when the partial text corresponding to one of the classification categories is extracted by the partial text extraction by the extraction unit, performs predetermined machine learning using the partial text extracted, and generates the classification rules.
1 Assignment
0 Petitions
Accused Products
Abstract
In a document classification device 100, a sample document extraction condition storage unit 160 stores sample document extraction conditions 160-1 set for each of classification categories for extracting partial text according to the classification categories from an input document 301 input by a document input unit 110. A document matching unit 120 matches the input document 301 against the sample document extraction conditions 160-1. Based on a result of matching by the document matching unit 120, a document extraction unit 130 extracts the partial text from the input document 301 according to the classification categories. A learning unit 140 performs predetermined machine learning using as a sample document the partial text extracted by the document extraction unit 120, and thereby generates classification rules 150-1.
26 Citations
23 Claims
-
1. A classification rule generation device comprising:
-
an input unit that inputs a document as a sample target document; a storage unit that stores extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of classification categories, the partial text being extracted from the sample target document according to the classification categories, the extraction conditions being set for each of the classification categories; a matching unit that matches the sample target document input by the input unit against the extraction conditions stored in the storage unit; an extraction unit that attempts to perform partial text extraction to extract the partial text from the sample target document according to the classification categories, based on a result of matching by the matching unit; and a learning unit that, when the partial text corresponding to one of the classification categories is extracted by the partial text extraction by the extraction unit, performs predetermined machine learning using the partial text extracted, and generates the classification rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A classification rule generation method that is executed by a classification rule generation device including an input unit, a storage unit, a matching unit, an extraction unit, and a learning unit, the classification rule generation method comprising:
-
using the input unit, inputting a sample target document; using the storage unit, storing extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of classification categories, the partial text being extracted from the sample target document according to the classification categories, the extraction conditions being set for each of the classification categories; by the matching unit, matching the sample target document that is input by the input unit against the extraction conditions stored in the storage unit; by the extraction unit, attempting to perform partial text extraction to extract the partial text from the sample target document according to the classification categories; by the learning unit, when the partial text corresponding to one of the classification categories is extracted by the partial text extraction by the extraction unit, performing predetermined machine learning using the partial text extracted, and generating the classification rules.
-
-
22. A classification rule generation program that makes a computer to function as
an input unit that inputs a document as a sample target document; -
a storage unit that stores extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of classification categories, the partial text being extracted from the sample target document according to the classification categories, the extraction conditions being set for each of the classification categories; a matching unit that matches the sample target document input by the input unit against the extraction conditions stored in the storage unit; an extraction unit that attempts to perform partial text extraction to extract the partial text from the sample target document according to the classification categories, based on a result of matching by the matching unit; and a learning unit that, when the partial text corresponding to one of the classification categories is extracted by the partial text extraction by the extraction unit, performs predetermined machine learning using the partial text extracted, and generates the classification rules. - View Dependent Claims (23)
-
Specification