Method and system to analyze data
First Claim
Patent Images
1. A method of mining a collection of data, comprising:
- receiving the collection of data, the collection of data comprising key words, wherein a key word comprises a coherent character string;
converting the collection of data into labeled data by grouping various types of data into a same format and assigning a label indicating a category of item contents, such that the labeled data is in analyzable condition for concept extraction, and wherein the labeled data comprises the label and a clause comprising the item contents;
assigning a category to the key words, wherein the category references a concept so that the key words can be handled as concepts with a meaning;
separating the clauses into pairs comprising an independent word and an attached word;
assigning categories to the separated clauses using syntactic patterns and a category dictionary;
generating, by syntactic analysis, a syntactic tree of a sentence comprising the separated clauses;
receiving a syntactically analyzed sentence as input, identifying mutually dependent relationships between or among the categorized key words, according to at least one rule defining mutually dependent relationships between or among categorized key words;
grouping the identified mutually dependent relationships into groups of related mutually dependent relationships; and
extracting the key words with mutually dependent relationships in the same sentence as labeled data with concepts, wherein the step of extracting key words comprises using a mutually dependent relationship extraction rule comprising a string of categories of arbitrary length to be extracted;
searching for unique concepts, a unique concept being a concept whose statistical characteristic is distinguished beyond a threshold with the set to which it belongs;
creating and keeping statistical information;
visually displaying the statistical information; and
presenting a distribution of differences of the unique concepts.
0 Assignments
0 Petitions
Accused Products
Abstract
Useful knowledge is acquired from a large amount of data by extracting concepts of a unique characteristic. The present invention comprises a concept extractor and a unique concept extractor. The concept extractor extracts categorized concepts from the data. The unique concept extractor is a device for extracting unique concepts from those extracted concepts, and extracts in the categorized concepts, of the concepts belonging to the same category, a concept whose statistical characteristic is distinguished beyond a threshold with respect to the set in which it belongs.
-
Citations
19 Claims
-
1. A method of mining a collection of data, comprising:
-
receiving the collection of data, the collection of data comprising key words, wherein a key word comprises a coherent character string; converting the collection of data into labeled data by grouping various types of data into a same format and assigning a label indicating a category of item contents, such that the labeled data is in analyzable condition for concept extraction, and wherein the labeled data comprises the label and a clause comprising the item contents; assigning a category to the key words, wherein the category references a concept so that the key words can be handled as concepts with a meaning; separating the clauses into pairs comprising an independent word and an attached word; assigning categories to the separated clauses using syntactic patterns and a category dictionary; generating, by syntactic analysis, a syntactic tree of a sentence comprising the separated clauses; receiving a syntactically analyzed sentence as input, identifying mutually dependent relationships between or among the categorized key words, according to at least one rule defining mutually dependent relationships between or among categorized key words; grouping the identified mutually dependent relationships into groups of related mutually dependent relationships; and extracting the key words with mutually dependent relationships in the same sentence as labeled data with concepts, wherein the step of extracting key words comprises using a mutually dependent relationship extraction rule comprising a string of categories of arbitrary length to be extracted; searching for unique concepts, a unique concept being a concept whose statistical characteristic is distinguished beyond a threshold with the set to which it belongs; creating and keeping statistical information; visually displaying the statistical information; and presenting a distribution of differences of the unique concepts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18)
-
-
13. An article of manufacture, embodying logic to perform a method of mining a collection of data, comprising:
-
a user interface for; receiving a collection of data, the collection of data comprising key words, wherein the key words comprise coherent character strings; visually displaying statistical information; and presenting a distribution of differences of unique concepts; and a processor embodying logic for; converting the collection of data into labeled data by grouping various types of data into a same format and assigning a label indicating a category of item contents, such that the labeled data is in analyzable condition for concept extraction, and wherein the labeled data comprises the label and a clause comprising the item contents; assigning a category to the key words, wherein the category references a concept so that the key words can be handled as concepts with a meaning; separating the clauses into pairs comprising an independent word and an attached word; assigning categories to the separated clauses using syntactic patterns and a category dictionary; generating, by syntactic analysis, a syntactic tree of a sentence comprising the separated clauses; receiving a syntactically analyzed sentence as input, identifying mutually dependent relationships between or among the categorized words, within each of the clauses; grouping the identified mutually dependent relationships into groups of related mutually dependent relationships; extracting the key words with mutually dependent relationships in the same sentence as labeled data with concepts, wherein the step of extracting key words comprises using a mutually dependent relationship extraction rule comprising a string of categories of arbitrary length to be extracted; and searching for the unique concepts, the unique concept being a concept whose statistical characteristic is distinguished beyond a threshold with the set to which it belongs; creating and keeping the statistical information. - View Dependent Claims (14, 15, 16, 17, 19)
-
Specification