Data processing device, data processing method, and data processing program
First Claim
1. A data processing device generating a graph which expresses an input data structure by a plurality of nodes having a single word as content thereof and by a dependency branch connecting two nodes in a dependent relationship within the plurality of nodes, and extracting a characteristic structure characterizing the input data from the graph, the device comprising:
- an association node extraction unit for extracting nodes semantically associated with each other, which are nodes corresponding to words representing same or similar content, from each of sentence structures in a given sentence structure collection, and outputting information on the sentence collection and association nodes in each of the sentence structures;
an association node joint unit for joining the nodes semantically associated with each other in each of the sentence structures by a semantic association branch based on the information on the sentence structure collection to newly generate a structure that expresses a concept which is not present, but implied, in each original sentence structure, and the association nodes in each of the sentence structures received from the association node extraction unit so as to transform each of the sentence structures in the sentence structure collection, and outputting a structure collection obtained by the transformation; and
a characteristic structure extraction unit for extracting a characteristic partial structure based on the sentence structure collection transformed by joining the semantic association branch received from the association node joint unit,wherein the characteristic structure extraction unit performs characteristic structure extraction processing by distinguishing a branch indicating a dependent relationship in the graph structure from the semantic association branch.
1 Assignment
0 Petitions
Accused Products
Abstract
[PROBLEMS] To provide a data processing device such as a text mining device capable of extracting characteristic structures properly even in case a plurality of words indicating identical contents or a plurality of words semantically associated are contained in input data. [MEANS FOR SOLVING PROBLEMS] Association node extraction unit (22) of a text mining device (10) extracts association nodes containing semantically associated words from a graph obtained as a result of syntax analysis. Association node joint unit (23) transforms the graph by joint of a part of or a whole of the association nodes. Characteristic structure extraction unit (24) extracts a characteristic structure from the graph transformed by the association node joint unit.
32 Citations
15 Claims
-
1. A data processing device generating a graph which expresses an input data structure by a plurality of nodes having a single word as content thereof and by a dependency branch connecting two nodes in a dependent relationship within the plurality of nodes, and extracting a characteristic structure characterizing the input data from the graph, the device comprising:
-
an association node extraction unit for extracting nodes semantically associated with each other, which are nodes corresponding to words representing same or similar content, from each of sentence structures in a given sentence structure collection, and outputting information on the sentence collection and association nodes in each of the sentence structures; an association node joint unit for joining the nodes semantically associated with each other in each of the sentence structures by a semantic association branch based on the information on the sentence structure collection to newly generate a structure that expresses a concept which is not present, but implied, in each original sentence structure, and the association nodes in each of the sentence structures received from the association node extraction unit so as to transform each of the sentence structures in the sentence structure collection, and outputting a structure collection obtained by the transformation; and a characteristic structure extraction unit for extracting a characteristic partial structure based on the sentence structure collection transformed by joining the semantic association branch received from the association node joint unit, wherein the characteristic structure extraction unit performs characteristic structure extraction processing by distinguishing a branch indicating a dependent relationship in the graph structure from the semantic association branch. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A data processing means for generating a graph which expresses an input data structure by a plurality of nodes having a single word as content thereof and by a dependency branch connecting two nodes in a dependent relationship within the plurality of nodes, and extracting a characteristic structure characterizing the input data from the graph, the means comprising:
-
an association node extraction means for extracting nodes semantically associated with each other, which are nodes corresponding to words representing same or similar content, from each of sentence structures in a given sentence structure collection, and outputting information on the sentence structure collection and association nodes in each of the sentence structures; an association node joint means for joining the nodes semantically associated with each other in each of the sentence structures by a semantic association branch based on the information on the sentence structure collection and the association nodes in each of the sentence structures received from the association node extraction means so as to transform each of the sentence structures in the sentence structure collection to newly generate a structure that expresses a concept which is not present, but implied, in each original sentence structure, and outputting a structure collection obtained by the transformation; and a characteristic structure extraction means for extracting a characteristic partial structure based on the sentence structure collection transformed by joining the semantic association branch received from the association node joint means, wherein the characteristic structure extraction means performs characteristic structure extraction processing by distinguishing a branch indicating a dependent relationship in the graph structure from the semantic association branch.
-
-
14. A data processing method generating a graph which expresses an input data structure by a plurality of nodes having a single word as content thereof and by a dependency branch which connects two nodes in a dependent relationship within the plurality of nodes, and extracting a characteristic structure characterizing the input data from the graph, the method comprising:
-
extracting nodes semantically associated with each other, which are nodes corresponding to words representing same or similar content, from each of sentence structures in a given sentence structure collection, and acquiring information on the sentence structure collection and association nodes in each of the sentence structures; joining the nodes associated with each other in each of the sentence structures by a semantic association branch based on the acquired information on the sentence structure collection and the association nodes in each of the sentence structures so as to transform each of the sentence structures in the sentence structure collection to newly generate a structure that expresses a concept which is not present, but implied, in each original sentence structure, and acquiring the structure collection obtained by the transformation; and extracting a characteristic partial structure based on the acquired sentence structure collection transformed by joining the semantic association branch; and performing, characteristic structure extraction processing by distinguishing a branch indicating a dependent relationship in the graph structure from the semantic association branch during the characteristic structure extraction processing.
-
-
15. A non-transitory computer readable recording medium storing a data processing program which controls processing for generating a graph which expresses an input data structure by a plurality of nodes having a single word as content thereof and by a dependency branch connecting two modes in a dependent relationship within the plurality of nodes and extracting a characteristic structure characterizing the input data from the graph, the program making a computer execute the functions of:
-
extracting nodes semantically associated with each other, which are nodes corresponding to words representing same or similar content, from each of sentence structures in a given sentence structure collection, and acquiring information on the sentence structure collection and association nodes in each of the sentence structures; joining the nodes semantically associated with each other in each of the sentence structures by a semantic association branch based on the acquired information on the sentence structure collection and the association nodes in each of the sentence structure so as to transform each of the sentence structures in the sentence structure collection to newly generate a structure that expresses a concept which is not present, but implied, in each original sentence structure, and acquiring the structure collection obtained by the transformation; and extracting a characteristic partial structure based on the acquired sentence structure collection transformed by joining the semantic association branch; and performing characteristic structure extraction processing by distinguishing a branch indicating a dependent relationship in the graph structure from the semantic association branch during the characteristic structure extraction processing.
-
Specification