Classification rule generation device, classification rule generation method, classification rule generation program, and recording medium
First Claim
1. A classification rule generation device comprising:
- an input circuit that inputs a document as a sample target document;
a storage circuit that stores extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of classification categories, the partial text being extracted from the sample target document according to the classification categories, the extraction conditions being set for each of the classification categories;
a matching circuit that matches the sample target document input by the input circuit against the extraction conditions stored in the storage circuit;
an extraction circuit that performs partial text extraction to extract the partial text from the sample target document according to the classification categories, based on a result of matching by the matching circuit; and
a learning circuit that, when the partial text corresponding to one of the classification categories is extracted by the partial text extraction by the extraction circuit, performs predetermined machine learning using the partial text extracted, and generates the classification rules,wherein the extraction conditions set for each of the classification categories include a keyword that corresponds to each of the classification categories,the matching circuit includes a position identification circuit that identifies an existing position of the keyword for each of the classification categories in the sample target document,the extraction circuit extracts a portion around and including the keyword as the partial text from the sample target document, based on the existing position of the keyword identified by the position identification circuit,the extraction conditions set for each of the classification categories are set such that type information indicating a type of the keyword is set for at least one of the keywords, andthe extraction circuit, when extracting the partial text corresponding to each of the classification categories from the sample target document, performs the partial text extraction based on the type information indicated by the keyword identified by the position identification circuit.
1 Assignment
0 Petitions
Accused Products
Abstract
In a document classification device 100, a sample document extraction condition storage unit 160 stores sample document extraction conditions 160-1 set for each of classification categories for extracting partial text according to the classification categories from an input document 301 input by a document input unit 110. A document matching unit 120 matches the input document 301 against the sample document extraction conditions 160-1. Based on a result of matching by the document matching unit 120, a document extraction unit 130 extracts the partial text from the input document 301 according to the classification categories. A learning unit 140 performs predetermined machine learning using as a sample document the partial text extracted by the document extraction unit 120, and thereby generates classification rules 150-1.
12 Citations
22 Claims
-
1. A classification rule generation device comprising:
-
an input circuit that inputs a document as a sample target document; a storage circuit that stores extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of classification categories, the partial text being extracted from the sample target document according to the classification categories, the extraction conditions being set for each of the classification categories; a matching circuit that matches the sample target document input by the input circuit against the extraction conditions stored in the storage circuit; an extraction circuit that performs partial text extraction to extract the partial text from the sample target document according to the classification categories, based on a result of matching by the matching circuit; and a learning circuit that, when the partial text corresponding to one of the classification categories is extracted by the partial text extraction by the extraction circuit, performs predetermined machine learning using the partial text extracted, and generates the classification rules, wherein the extraction conditions set for each of the classification categories include a keyword that corresponds to each of the classification categories, the matching circuit includes a position identification circuit that identifies an existing position of the keyword for each of the classification categories in the sample target document, the extraction circuit extracts a portion around and including the keyword as the partial text from the sample target document, based on the existing position of the keyword identified by the position identification circuit, the extraction conditions set for each of the classification categories are set such that type information indicating a type of the keyword is set for at least one of the keywords, and the extraction circuit, when extracting the partial text corresponding to each of the classification categories from the sample target document, performs the partial text extraction based on the type information indicated by the keyword identified by the position identification circuit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A classification rule generation device comprising:
-
an input circuit that inputs a document as a sample target document; a storage circuit that stores extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of plural classification categories, the partial text being extracted from the sample target document according to the plural classification categories, the extraction conditions being set for each of the plural classification categories; a matching circuit that matches the sample target document input by the input circuit against the extraction conditions stored in the storage circuit an extraction circuit that performs partial text extraction to extract the partial text from the sample target document according to the plural classification categories, based on a result of matching by the matching circuit; a learning circuit that, when the partial text corresponding to one of the plural classification categories is extracted by the partial text extraction by the extraction circuit, performs predetermined machine learning using the partial text extracted, and generates the classification rules, wherein the extraction conditions set for each of the plural classification categories include a keyword that corresponds to each of the plural classification categories, the matching circuit includes a position identification circuit that identifies an existing position of the keyword for each of the plural classification categories in the sample target document, the extraction conditions set for each of the plural classification categories are set such that type information indicating a type of the appropriate keyword is set for at least one of the appropriate keywords, and the extraction circuit, when extracting the partial text corresponding to each of the plural classification categories from the sample target document, performs the partial text extraction based on the type information indicated by the keyword identified by the position identification circuit. - View Dependent Claims (20)
-
-
21. A classification rule generation method that is executed by a classification rule generation device including an input circuit, a storage circuit, a matching circuit, an extraction circuit, and a learning circuit, the classification rule generation method comprising:
-
using the input circuit, inputting a sample target document; using the storage circuit, storing extraction conditions for extracting partial text which is a portion of the sample target document and which is used for generating classification rules for classifying a classification target document to be classified into one of classification categories, the partial text being extracted from the sample target document according to the classification categories, the extraction conditions being set for each of the classification categories; by the matching circuit, matching the sample target document that is input by the input circuit against the extraction conditions stored in the storage circuit; by the extraction circuit, performing partial text extraction to extract the partial text from the sample target document according to the classification categories; by the learning circuit, when the partial text corresponding to one of the classification categories is extracted by the partial text extraction by the extraction circuit, performing predetermined machine learning using the partial text extracted, and generating the classification rules, by a position identification circuit, identifying an existing position of the keyword for each of the classification categories in the sample target document; and by the extraction circuit, extracting a portion around and including the keyword as the partial text from the sample target document, based on the existing position of the keyword identified by the position identification circuit, the extraction conditions set for each of the classification categories including a keyword that corresponds to each of the classification categories, the extraction conditions set for each of the classification categories are set such that type information indicating a type of the keyword is set for at least one of the keywords, and the extraction circuit, when extracting the partial text corresponding to each of the classification categories from the sample target document, performs the partial text extraction based on the type information indicated by the keyword identified by the position identification circuit. - View Dependent Claims (22)
-
Specification