System and method for the automatic mining of new relationships
First Claim
1. A system of automatically mining a new relationship for an extracted pair pi and a relation ri within a document di, comprising:
- a relations database for storing previously identified sets of pairs Pi−
1 and relations Ri−
1;
a relations database for storing previously identified sets of pairs Pi−
1 and relations Ri−
1;
a relationship database for storing the relation ri under a corresponding relationship category;
wherein if there is no relationship category corresponding to the relation ri in the relationship database, a knowledge module receives the extracted pair pi and the relation ri and applies a linguistic technique to classify the relation ri;
wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receives the relation ri and applies a statistical technique to classify the relation ri; and
wherein if the statistical technique fails to identity a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation ri under the new relationship category in the relationship database.
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic mining system that identifies a set of relevant terms from a large text database of unstructured information, such as the World Wide Web with a high degree of confidence. The automatic mining system includes a software program that enables the discovery of new relationships by association mining and refinement of co-occurrences, using automatic and iterative recognition of new binary relations through phrases that embody related pairs, by applying lexicographic and statistical techniques to classify the relations, and further by applying a minimal amount of domain knowledge of the relevance of the terms and relations. The automatic mining system includes a knowledge module and a statistics module. The knowledge module is comprised of a stemming unit, a synonym check unit, and a domain knowledge check unit. The stemming unit determines if the relation being analyzed shares a common root with a previously mined relation. The synonym check unit identifies the synonyms of the relation, and the domain knowledge check unit considers extrinsic factors for indications that would further clarify the relationship being mined. The statistics module optimizes the confidence level in the relationship.
189 Citations
21 Claims
-
1. A system of automatically mining a new relationship for an extracted pair pi and a relation ri within a document di, comprising:
-
a relations database for storing previously identified sets of pairs Pi−
1 and relations Ri−
1;
a relations database for storing previously identified sets of pairs Pi−
1 and relations Ri−
1;
a relationship database for storing the relation ri under a corresponding relationship category;
wherein if there is no relationship category corresponding to the relation ri in the relationship database, a knowledge module receives the extracted pair pi and the relation ri and applies a linguistic technique to classify the relation ri;
wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receives the relation ri and applies a statistical technique to classify the relation ri; and
wherein if the statistical technique fails to identity a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation ri under the new relationship category in the relationship database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
a stemming unit that determines if the relation ri shares a common root with a previously mined relation ri−
1 in the database;
a synonym check unit that identifies one or more synonyms of the relation ri;
a domain knowledge check unit that considers one or more extrinsic indications to clarify the new relationship.
-
-
6. The system according to claim 1, wherein the knowledge module includes a stemming unit that determines if the relation ri shares a common root with a previously mined relation ri−
- 1 in the database, and removes common suffixes, if any.
-
7. The system according to claim 1, wherein the linguistic technique comprises stemming.
-
8. The system according to claim 7, wherein the linguistic technique further comprises synonym checking.
-
9. The system according to claim 1, wherein the knowledge module further applies domain knowledge to classify the relation ri.
-
10. A computer program product of automatically mining a new relationship for an extracted pair pi and a relation ri within a document di, comprising:
-
a relations database for storing previously identified sets of pairs Pi−
1 and relations Ri−
1;
a relations database for storing previously identified sets of pairs Pi−
1 and relations Ri−
1;
a relationship database for storing the relation ri under a corresponding relationship category;
wherein if there is no relationship category corresponding to the relation ri in the relationship database, a knowledge module receives the extracted pair pi and the relation ri and applies a linguistic technique to classify the relation ri;
wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receives the relation ri and applies a statistical technique to classify the relation ri; and
wherein if the statistical technique fails to identify a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation ri under the new relationship category in the relationship database. - View Dependent Claims (11, 12, 13, 14, 15)
a stemming unit that determines if the relation ri shares a common root with a previously mined relation ri−
1 in the database;
a synonym check unit that identifies one or more synonyms of the relation ri;
a domain knowledge check unit that considers one or more extrinsic indications to clarify the new relationship.
-
-
15. The computer program product according to claim 10, wherein the knowledge module includes a stemming unit that determines if the relation ri shares a common root with a previously mined relation ri−
- 1 in the database, and removes common suffixes, if any.
-
16. A method of automatically mining a new relationship for an extracted pair pi and a relation ri within a document di, comprising:
-
a relations database storing previously identified sets of pairs Pi−
1 and relations Ri−
1;
a relationship database storing the relation ri under a corresponding relationship category;
wherein if there is no relationship category corresponding to the relation ri in the relationship database, a knowledge module receiving the extracted pair pi and the relation ri and applying a linguistic technique to classify the relation ri;
wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receiving the relation ri and applies a statistical technique to classify the relation ri; and
wherein if the statistical technique fails to identify a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation ri under the new relationship category in the relationship database. - View Dependent Claims (17, 18, 19, 20, 21)
determining if the relation ri shares a common root with a previously mined relation ri−
1 in the relations database;
identifying one or more synonyms of the relation ri; and
considering one or more extrinsic indications to clarify the new relationship.
-
-
21. The method according to claim 16, further including determining if the relation ri shares a common root with a previously mined
Specification