Information extraction system and method using concept-relation-concept (CRC) triples

US 6,263,335 B1
Filed: 03/29/1999
Issued: 07/17/2001
Est. Priority Date: 02/09/1996
Status: Expired due to Term

First Claim

Patent Images

1. A computer program product for preparing a set of documents to support information extraction, the computer program product comprising:

code for defining a set of concepts;

code for defining relations between the concepts, the rations including monadic relations associated with single concepts and dyadic relations between concepts;

code for defining a set of rules that allow extraction of relations;

code for receiving a plurality of documents;

code for parsing the documents to identify concepts;

code for extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples (CRCs) and relation-concept pairs RCs) from the parsed documents;

code for incorporating the CRCs and RCs into a data organization; and

a computer readable medium for storing the codes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information extraction system that allows users to ask questions about documents in a database, and responds to queries by returning possibly relevant information which is extracted from the documents. The system is domain-independent, and automatically builds its own subject knowledge base. It can be applied to any new corpus of text with quick results, and no requirement for lengthy manual input. For this reason, it is also a dynamic system which can acquire new knowledge and add it to the knowledge base immediately by automatically identifying new names, events, or concepts.

Citations

20 Claims

1. A computer program product for preparing a set of documents to support information extraction, the computer program product comprising:
- code for defining a set of concepts;
  
  code for defining relations between the concepts, the rations including monadic relations associated with single concepts and dyadic relations between concepts;
  
  code for defining a set of rules that allow extraction of relations;
  
  code for receiving a plurality of documents;
  
  code for parsing the documents to identify concepts;
  
  code for extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples (CRCs) and relation-concept pairs RCs) from the parsed documents;
  
  code for incorporating the CRCs and RCs into a data organization; and
  
  a computer readable medium for storing the codes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The computer program product of claim 1 wherein:
3. The computer program product of claim 1 further comprising code for indexing the CRCs based on features of the CRCs including their concepts and their relations.
4. The computer program product of claim 1 wherein said set of rules includes a set of category-specific syntactic constructs.
5. The computer program product of claim 4 wherein said set of category-specific syntactic constructs includes coreferential proper names.
6. The computer program product of claim 1 wherein said set of rules includes a set of lexical constructs that imply a particular relation.
7. The computer program product of claim 1 wherein said code for parsing the documents to identify concepts includes code selectable from codes for determining phrase boundaries, determining parts of speech, identifying numeric concepts, identifying phrasal verbs, identifying idioms, and identifying proper names.
8. The computer program product of claim 1 wherein each CRC that is incorporated into the data organization includes an indication of the date, if any, of the document from which the CRC was extracted.
9. The computer program product of claim 1 further comprising code, executed in response to a user request, forextracting time-related information from a set of CRCs to create a timeline which describes the history of any concept over a specified period.
10. The computer program product of claim 1 wherein at least some of said documents are labeled by at least one of the group consisting of source reliability, source credibility, and source reputation.
11. The computer program product of claim 1 wherein at least some of said CRCs include at least one embedded CRC to provide a chain.
12. The computer product of claim 1, further comprising:
- code for accepting a query;
  
  code for parsing the query to identify concepts;
  
  code for applying the set of rules to the parsed query to extract CRCs; and
  
  code for extracting from the data organization a set of CRCs that match in at least one regard the CRCs extracted from the query.
13. The computer program product of claim 12 wherein said query is a “
- Who-What Where-When-Why-How”
  
  question.
14. The computer program product of claim 12 further comprising code for displaying extracted CRCs as a knowledge representation.
15. The computer program product of claim 14 wherein the knowledge representation is one of the group consisting of a conceptual graph, a semantic network, and a frame.
16. The computer program product of claim 12 further comprising code for filtering the set of retrieved CRCs according to user input.
17. The computer program product of claim 12 wherein frequency and/or recency of a CRC is used to filter or limit the number of documents reported.
18. The computer program product of claim 1 further comprising:
- code for accepting a browsing request specifying a class of CRCs to browse;
  
  in response to the browsing request, code for extracting from the data organization a set of CRCs that match the class of CRCs; and
  
  code for displaying the results in a hypertext display of active information nodes to allow user to explore a broad idea rather than create a W-H query.

19. A computer program product for preparing a set of documents to support information extraction, the computer program product comprising:
- code for defining a set of concept categories;
  
  code for defining a set of dyadic relations between concepts;
  
  code for defining a set of rules that allow extraction of relations between concepts;
  
  code for receiving a corpus containing documents;
  
  code for parsing the documents to identify concepts;
  
  code for extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents, said extracting includes mapping syntactic relations to semantic relations using said set of rules;
  
  code for incorporating the CRCs into a data organization; and
  
  a computer readable medium for storing the codes.

20. A computer program product for preparing a set of documents to support information extraction, the computer program product comprising:
- code for defining a set of concept categories;
  
  code for defining a set of dyadic relations between concepts;
  
  code for defining a set of rules that allow extraction of relations between concepts;
  
  code for receiving a corpus containing documents;
  
  code for parsing the documents to identify concepts;
  
  code for extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents;
  
  code for incorporating the CRCs into a data organization;
  
  code for extracting time-related information from a set of CRCs to create a timeline which describes the history of any concept over a specified period; and
  
  a computer readable medium for storing the codes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TextWise Company, LLC
Original Assignee
TextWise Company, LLC
Inventors
Paik, Woojin, Liddy, Elizabeth D., Liddy, Jennifer Heverin, Niles, Ian Harcourt, Allen, Eileen E.
Primary Examiner(s)
Choules, Jack M.

Application Number

US09/280,228
Time in Patent Office

841 Days
Field of Search

707/3, 707/4, 707/5, 707/6
US Class Current

1/1
CPC Class Codes

G06F 16/353   into predefined classes

G06F 16/367   Ontology

Y10S 707/99935   Query augmenting and refini...

Information extraction system and method using concept-relation-concept (CRC) triples

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Information extraction system and method using concept-relation-concept (CRC) triples

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links