Text mining device, method thereof, and program
First Claim
1. A text mining apparatus comprising:
- means for generating a sentence structure from an input document, the sentence structure representing a dependency among words;
means for generating a similar structure of patterns having a similar meaning of a partial structure of the sentence structure by performing predetermined conversion operation, including at least change in connection of branches in a graph structure, of the partial structure; and
means for determining the patterns having the similar meaning as the identical pattern and detecting the patterns,wherein the means for generating the similar structure comprises;
means for performing parallel modification of the sentence structure, the parallel modification being structure modification including new branch generation for a particular one of nodes corresponding to the words put in a parallel relationship in the sentence structure so that the particular one is connected to each node connected by a branch from the node put in the parallel relationship for the particular one, said means for performing parallel modification of the sentence structure generating the similar structure;
means for generating a plurality of new partial structures of the sentence structure from the partial structure and the similar structure;
means for performing non-directional branching of a directional branch of the sentence structure and the plurality of new partial structures to produce new similar structures;
means for replacing a synonym in the sentence structure and the plurality of new partial structures by referring to a synonym dictionary to produce new similar structures; and
means for performing non-ordering of ordering trees of the sentence structure and the plurality of new partial structures to produce new similar structures, and whereinthe means for generating the similar structure uses the new similar structures as an equivalent class of the plurality of new partial structures of the sentence structure.
2 Assignments
0 Petitions
Accused Products
Abstract
Language analysis means 21 analyzes texts read from a text DB 11, and generates a sentence structure as the analysis result. Similar-structure generation adjustment means 25 generates, from an input of an input device, a determination item for determining whether or not the structures are identical every type of differences between the sentence structures. Similar-structure determination adjustment means 26 generates, from an input of the input device 6, a determination item for determining whether or not the difference between attribute values is ignored every type of attribute values. Similar-structure generating means 22 generates a similar structure of a partial structure forming the sentence structure obtained by language analysis means 21 in accordance with the determination item from the similar-structure generation adjustment means 25, and sets the generated similar structure as an equivalent class of the partial structure on the generation source. Frequent-similar-pattern detection means 24 ignores the attribute value in accordance with the determination item given from the similar-structure determination adjustment means 26, detects the frequent pattern on the basis of a set of equivalent classes from the similar-structure generating means 22, and outputs the frequent pattern to an output device 3.
-
Citations
16 Claims
-
1. A text mining apparatus comprising:
-
means for generating a sentence structure from an input document, the sentence structure representing a dependency among words; means for generating a similar structure of patterns having a similar meaning of a partial structure of the sentence structure by performing predetermined conversion operation, including at least change in connection of branches in a graph structure, of the partial structure; and means for determining the patterns having the similar meaning as the identical pattern and detecting the patterns, wherein the means for generating the similar structure comprises; means for performing parallel modification of the sentence structure, the parallel modification being structure modification including new branch generation for a particular one of nodes corresponding to the words put in a parallel relationship in the sentence structure so that the particular one is connected to each node connected by a branch from the node put in the parallel relationship for the particular one, said means for performing parallel modification of the sentence structure generating the similar structure; means for generating a plurality of new partial structures of the sentence structure from the partial structure and the similar structure; means for performing non-directional branching of a directional branch of the sentence structure and the plurality of new partial structures to produce new similar structures; means for replacing a synonym in the sentence structure and the plurality of new partial structures by referring to a synonym dictionary to produce new similar structures; and means for performing non-ordering of ordering trees of the sentence structure and the plurality of new partial structures to produce new similar structures, and wherein the means for generating the similar structure uses the new similar structures as an equivalent class of the plurality of new partial structures of the sentence structure. - View Dependent Claims (2)
-
-
3. A text mining apparatus comprising:
-
a storage unit that stores a set of documents as a text mining object; an analyzing unit that reads and analyzes the document from the storage unit and obtains a sentence structure representing a dependency among words; a similar-structure generating unit that performs predetermined modification operation, including at least change in connection of branches in a graph structure, of the partial structure of the sentence structure obtained by the analysis of the analyzing unit, and generates a similar structure of patterns having a similar meaning; and a pattern detecting unit that uses the similar structure generated by the similar-structure generating unit as an equivalent class of the partial structure on the generation source, and detects the pattern, wherein the similar-structure generating unit comprises; means for performing parallel modification of the sentence structure, the parallel modification being structure modification including new branch generation for a particular one of nodes corresponding to the words put in a parallel relationship in the sentence structure so that the particular one is connected to each node connected by a branch from the node put in the parallel relationship for the particular one, said means for performing parallel modification of the sentence structure generating the similar structure; means for generating a plurality of new partial structures of the sentence structure from the partial structure and the similar structure; means for performing non-directional branching of a directional branch of the sentence structure and the plurality of new partial structures to produce new similar structures; means for replacing a synonym in the sentence structure and the plurality of new partial structures by referring to a synonym dictionary to produce new similar structures; and means for performing non-ordering of ordering trees in the sentence structure and the plurality of new partial structures to produce new similar structures , and wherein the similar-structure generating unit generates the new similar structures of the sentence structure and sets the new similar structures as an equivalent class of the plurality of new partial structures of the sentence structure. - View Dependent Claims (4, 5)
-
-
6. A text mining apparatus comprising:
-
a storage unit that stores a set of documents as a text mining object; an analyzing unit that reads and analyzes the document from the storage unit and obtains a sentence structure representing a dependency among words; a similar-structure generation adjustment unit that generates a first determination item for determining, from a user input, whether or not the structures are identical ones for every type of differences between the sentence structures; a similar-structure determination adjustment unit that generates a second determination item for determining, from a user input, whether or not the structures are identical ones for every type of differences between attribute values; a similar-structure generating unit that performs predetermined conversion operation of a partial structure of the sentence structure obtained by the analyzing unit in accordance with the first determination item generated by the similar-structure generation adjustment unit and generates similar structures having a similar meaning of the partial structure; and a similar-pattern detecting unit that uses the similar structure generated by the similar-structure generating unit as an equivalent class of the partial structure on the generation source and detects the frequent pattern by ignoring the difference between the attribute values in accordance with the second determination item of the similar-structure determination adjustment unit, wherein the similar-structure generating unit comprises; means for performing parallel modification of the sentence structure when the first determination item determines the parallel modification, the parallel modification being structure modification including new branch generation for a particular one of nodes corresponding to the words put in a parallel relationship in the sentence structure so that the particular one is connected to each node connected by a branch from the node put in the parallel relationship for the particular one, said means for performing parallel modification of the sentence structure generating the similar structure; means for generating a plurality of new partial structures of the sentence structure from the partial structure and the similar structure; means for performing non-directional branching of a directional branch of the sentence structure and the plurality of new partial structures when the first determination item determines the non-directional branching of the directional branch to produce new similar structures; means for replacing a synonym in the sentence structure and the plurality of new partial structures by referring to a synonym dictionary when the first determination item includes replacement of the synonym to produce new similar structures; and means for performing non-ordering of ordering trees of the sentence structure and the plurality of new partial structures when the first determination item determines the non-ordering of the ordering trees to produce new similar structures, and wherein the similar-structure generating unit generates the new similar structures of the sentence structure and sets the new similar structures as the equivalent class of the plurality of new partial structures of the sentence structure. - View Dependent Claims (7, 8)
-
-
9. A text mining method comprising:
-
a step of generating, using a computer, a sentence structure from an input document, the sentence structure representing a dependency among words; a step of generating, using the computer, a similar structure of patterns having a similar meaning of a partial structure of the sentence structure by performing predetermined conversion operation, including at least change in connection of branches in a graph structure, of the partial structure; and a step of determining the patterns having the similar meaning as the identical pattern and detecting the patterns, wherein the step of generating the similar structure comprises; a step of performing parallel modification of the sentence structure, the parallel modification being structure modification including new branch generation for a particular one of nodes corresponding to the words put in a parallel relationship in the sentence structure so that the particular one is connected to each node connected by a branch from the node put in the parallel relationship for the particular one, said step of performing parallel modification of the sentence structure generating the similar structure; a step of generating a plurality of new partial structures of the sentence structure from the partial structure and the similar structure; a step of performing non-directional branching of a directional branch of the sentence structure and the plurality of new partial structures to produce new similar structures; a step of replacing a synonym in the sentence structure and the plurality of new partial structures by referring to a synonym dictionary to produce new similar structures; and a step of performing non-ordering of ordering trees in the sentence structure and the plurality of new partial structures to produce new similar structures, and thereby the step of generating the similar structure setting new similar structures as an equivalent class of the plurality of new partial structures. - View Dependent Claims (10)
-
-
11. A text mining method comprising:
-
a step of analyzing a document from a storage unit that stores a set of documents as a text mining object and obtaining a sentence structure representing a dependency among words; a step of performing predetermined modification operation, including at least change in connection of branches in a graph structure, of a partial structure of the sentence structure and generating, using a computer, a similar structure having patterns with a similar meaning; a step of using the generated similar structures as an equivalent class of the partial structure on the generation source and detecting the pattern, wherein the step of generating the similar structure comprises; a step of performing parallel modification of the sentence structure, the parallel modification being structure modification including new branch generation for a particular one of nodes corresponding to the words put in a parallel relationship in the sentence structure so that the particular one is connected to each node connected by a branch from the node put in the parallel relationship for the particular one said step of performing parallel modification of the sentence structure generating the similar structure; a step of generating a plurality of new partial structures of the sentence structure from the partial structure and the similar structure; a step of performing non-directional branching of the directional branch of the sentence structure and the plurality of new partial structures to produce new similar structures; a step of replacing a synonym in the sentence structure and the plurality of new partial structures by referring to a synonym dictionary to produce new similar structures; and a step of performing non-ordering of ordering trees in the sentence structure and the plurality of new partial structures to produce new similar structures, and thereby the step of generating the similar structure generating the new similar structures of the sentence structure and setting the new similar structures as an equivalent class of the plurality of new partial structures. - View Dependent Claims (12, 13)
-
-
14. A text mining method comprising:
-
a step of analyzing a document from a storage unit that stores a set of documents as a text mining object and obtaining a sentence structure representing a dependency among words; a step of generating, from a user input, a first determination item for determining whether or not the structures are identical ones for every type of differences between sentence structures; a step of generating, from a user input, a second determination item for determining whether or not the structures are identical ones for every type of differences between attribute values; a step of performing predetermined modification operation of the partial structure of the sentence structure obtained by the analyzing unit and generating, using a computer, a similar structure having a similar meaning of the partial structure in accordance with the generated first determination item; and a step of using the generated similar structure as an equivalent class of the partial structure on the generation source and detecting the pattern by ignoring the difference between the attribute values in accordance with the second determination item, wherein the step of generating the similar structure comprises; a step of performing parallel modification of the sentence structure when the first determination item determines the parallel modification, the parallel modification being structure modification including new branch generation for a particular one of nodes corresponding to the words put in a parallel relationship in the sentence structure so that the particular one is connected to each node connected by a branch from the node put in the parallel relationship for the particular one, said step of performing parallel modification of the sentence structure generating the similar structure; a step of generating a plurality of new partial structures of the sentence structure from the partial structure and the similar structure; a step of performing non-directional branching of a directional branch of the sentence structure and the plurality of new partial structures when the first determination item determines the non-directional branching of the directional branch to produce new similar structures; a step of replacing a synonym of the sentence structure and the plurality of new partial structures by referring to a synonym dictionary when the first determination item determines the synonym replacement to produce new similar structures; and a step of performing non-directional branching of ordering trees of the sentence structure and the plurality of new partial structures when the first determination item determines the non-directional branching of the ordering trees to produce new similar structures, and thereby the step of generating the similar structure generating the new similar structures of the sentence structure and setting the new similar structures as an equivalent class of the plurality of new partial structures. - View Dependent Claims (15, 16)
-
Specification