System and method for the automatic mining of new relationships

US 6,539,376 B1
Filed: 11/15/1999
Issued: 03/25/2003
Est. Priority Date: 11/15/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A system of automatically mining a new relationship for an extracted pair p_iand a relation r_iwithin a document d_i, comprising:

a relations database for storing previously identified sets of pairs P_i−

1and relations R_i−

1;

a relations database for storing previously identified sets of pairs P_i−

1and relations R_i−

1;

a relationship database for storing the relation r_iunder a corresponding relationship category;

wherein if there is no relationship category corresponding to the relation r_iin the relationship database, a knowledge module receives the extracted pair p_iand the relation r_iand applies a linguistic technique to classify the relation r_i;

wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receives the relation r_iand applies a statistical technique to classify the relation r_i; and

wherein if the statistical technique fails to identity a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation r_iunder the new relationship category in the relationship database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic mining system that identifies a set of relevant terms from a large text database of unstructured information, such as the World Wide Web with a high degree of confidence. The automatic mining system includes a software program that enables the discovery of new relationships by association mining and refinement of co-occurrences, using automatic and iterative recognition of new binary relations through phrases that embody related pairs, by applying lexicographic and statistical techniques to classify the relations, and further by applying a minimal amount of domain knowledge of the relevance of the terms and relations. The automatic mining system includes a knowledge module and a statistics module. The knowledge module is comprised of a stemming unit, a synonym check unit, and a domain knowledge check unit. The stemming unit determines if the relation being analyzed shares a common root with a previously mined relation. The synonym check unit identifies the synonyms of the relation, and the domain knowledge check unit considers extrinsic factors for indications that would further clarify the relationship being mined. The statistics module optimizes the confidence level in the relationship.

189 Citations

21 Claims

1. A system of automatically mining a new relationship for an extracted pair p_iand a relation r_iwithin a document d_i, comprising:
- a relations database for storing previously identified sets of pairs P_i−
  
  1and relations R_i−
  
  1;
  
  a relations database for storing previously identified sets of pairs P_i−
  
  1and relations R_i−
  
  1;
  
  a relationship database for storing the relation r_iunder a corresponding relationship category;
  
  wherein if there is no relationship category corresponding to the relation r_iin the relationship database, a knowledge module receives the extracted pair p_iand the relation r_iand applies a linguistic technique to classify the relation r_i;
  
  wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receives the relation r_iand applies a statistical technique to classify the relation r_i; and
  
  wherein if the statistical technique fails to identity a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation r_iunder the new relationship category in the relationship database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system according to claim 1, wherein the knowledge module includes a stemming unit that determines if the relation r_ishares a common root with a previously mined relation r_i−
    - 1 in the relations database.
  - 3. The system according to claim 2, wherein the knowledge module includes a synonym check unit that identifies one or more synonyms of the relation r_i.
  - 4. The system according to claim 2, wherein the knowledge module includes a domain knowledge check unit that considers one or more extrinsic indications to clarify the new relationship.
  - 5. The system according to claim 1, wherein the knowledge module includes:
6. The system according to claim 1, wherein the knowledge module includes a stemming unit that determines if the relation r_ishares a common root with a previously mined relation r_i−
- 1 in the database, and removes common suffixes, if any.
7. The system according to claim 1, wherein the linguistic technique comprises stemming.
8. The system according to claim 7, wherein the linguistic technique further comprises synonym checking.
9. The system according to claim 1, wherein the knowledge module further applies domain knowledge to classify the relation r_i.

10. A computer program product of automatically mining a new relationship for an extracted pair p_iand a relation r_iwithin a document d_i, comprising:
- a relations database for storing previously identified sets of pairs P_i−
  
  1and relations R_i−
  
  1;
  
  a relations database for storing previously identified sets of pairs P_i−
  
  1and relations R_i−
  
  1;
  
  a relationship database for storing the relation r_iunder a corresponding relationship category;
  
  wherein if there is no relationship category corresponding to the relation r_iin the relationship database, a knowledge module receives the extracted pair p_iand the relation r_iand applies a linguistic technique to classify the relation r_i;
  
  wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receives the relation r_iand applies a statistical technique to classify the relation r_i; and
  
  wherein if the statistical technique fails to identify a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation r_iunder the new relationship category in the relationship database.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer program product according to claim 10, wherein the knowledge module includes a stemming unit that determines if the relation r_ishares a common root with a previously mined relation r_i−
    - 1 in the relations database.
  - 12. The computer program product according to claim 11, wherein the knowledge module includes a synonym check unit that identifies one or more synonyms of the relation r_i.
  - 13. The computer program product according to claim 11, wherein the knowledge module includes a domain knowledge check unit that considers one or more extrinsic indications to clarify the new relationship.
  - 14. The computer program product according to claim 10, wherein the knowledge module includes:
15. The computer program product according to claim 10, wherein the knowledge module includes a stemming unit that determines if the relation r_ishares a common root with a previously mined relation r_i−
- 1 in the database, and removes common suffixes, if any.

16. A method of automatically mining a new relationship for an extracted pair p_iand a relation r_iwithin a document d_i, comprising:
- a relations database storing previously identified sets of pairs P_i−
  
  1and relations R_i−
  
  1;
  
  a relationship database storing the relation r_iunder a corresponding relationship category;
  
  wherein if there is no relationship category corresponding to the relation r_iin the relationship database, a knowledge module receiving the extracted pair p_iand the relation r_iand applying a linguistic technique to classify the relation r_i;
  
  wherein if the linguistic technique fails to identify a corresponding relationship category in the relationship database, a statistics module receiving the relation r_iand applies a statistical technique to classify the relation r_i; and
  
  wherein if the statistical technique fails to identify a corresponding relationship category in the relationship database, defining a new relationship category and storing the relation r_iunder the new relationship category in the relationship database.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The method according to claim 16, further including determining if the relation r_ishares a common root with a previously mined relation r_i−
    - 1 in the relations database.
  - 18. The method according to claim 17, further including identifying one or more synonyms of the relation r_i.
  - 19. The method according to claim 17, further including considering one or more extrinsic indications to clarify the new relationship.
  - 20. The method according to claim 16, further including:
21. The method according to claim 16, further including determining if the relation r_ishares a common root with a previously mined

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Sundaresan, Neelakantan, Yi, Jeonghee
Primary Examiner(s)
Homere, Jean R.
Assistant Examiner(s)
Wassum, Luke S

Application Number

US09/440,626
Time in Patent Office

1,226 Days
Field of Search

707/1-5, 707/10, 704/9
US Class Current

1/1
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/353   into predefined classes

G06F 16/95   Retrieval from the web

Y10S 707/99935   Query augmenting and refini...

System and method for the automatic mining of new relationships

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

189 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for the automatic mining of new relationships

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

189 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links