Information extraction system and method using concept relation concept (CRC) triples
First Claim
Patent Images
1. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
- defining a set of concept categories;
defining a set of monadic relations associated with single concepts;
defining a set of dyadic relations between concepts;
defining a set of rules that allow extraction of monadic relations associated with concepts and dyadic relations between concepts;
receiving a corpus containing documents;
parsing the documents to identify concepts;
extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, and relation-concept pairs, referred to as RCs from the parsed documents; and
incorporating the CRCs and the RCs into a data organization.
2 Assignments
0 Petitions
Accused Products
Abstract
An information extraction system that allows users to ask questions about documents in a database, and responds to queries by returning possibly relevant information which is extracted from the documents. The system is domain-independent, and automatically builds its own subject knowledge base. It can be applied to any new corpus of text with quick results, and no requirement for lengthy manual input. For this reason, it is also a dynamic system which can acquire new knowledge and add it to the knowledge base immediately by automatically identifying new names, events, or concepts.
986 Citations
43 Claims
-
1. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of monadic relations associated with single concepts; defining a set of dyadic relations between concepts; defining a set of rules that allow extraction of monadic relations associated with concepts and dyadic relations between concepts; receiving a corpus containing documents; parsing the documents to identify concepts; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, and relation-concept pairs, referred to as RCs from the parsed documents; and incorporating the CRCs and the RCs into a data organization. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories, said concept categories including at least one of person, company, and geographical location; defining a set of dyadic relations between concepts, said relations including at least one of affiliation, agent, location, and object; defining a set of rules that allow extraction of relations between concepts, said set of rules including a set of category-specific syntactic constructs and a set of lexical constructs that imply a particular relation; receiving a corpus containing documents; parsing the documents to identify concepts by determining phrase boundaries, determining parts of speech, identifying numeric concepts, identifying phrasal verbs, identifying idioms, and identifying proper names in the documents; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents; and incorporating the CRCs into a data organization. - View Dependent Claims (31, 32, 33, 34, 35, 36)
-
-
37. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of dyadic relations between concepts;
defining a set of rules that allow extraction of relations between concepts;receiving a corpus containing documents; parsing the documents to identify concepts; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents, said extracting includes mapping syntactic relations to semantic relations using said set of rules; and incorporating the CRCs into a data organization.
-
-
38. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of dyadic relations between concepts; defining a set of rules that allow extraction of relations between concepts, the set of rules including a set of category-specific syntactic constructs; receiving a corpus containing documents; parsing the documents to identify concepts; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents; and incorporating the CRCs into a data organization.
-
-
39. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of dyadic relations between concepts; defining a set of rules that allow extraction of relations between concepts; receiving a corpus containing documents; parsing the documents to identify concepts; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents; incorporating the CRCs into a data organization; and in response to a user request, extracting time-related information from a set of CRCs to create a timeline which describes the history of any concept over a specified period.
-
-
40. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of dyadic relations between concepts; defining a set of rules that allow extraction of relations between concepts; receiving a corpus containing documents; parsing the documents to identify concepts; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCS, from the parsed documents; and incorporating the CRCs into a data organization; wherein at least some of said documents are labelled by at least one of the group consisting of source reliability, source credibility, and source reputation.
-
-
41. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of dyadic relations between concepts; defining a set of rules that allow extraction of relations between concepts; receiving a corpus containing documents; parsing the documents to identify concepts; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents, said CRCs including at least one embedded CRC to provide a chain; and incorporating the CRCs into a data organization.
-
-
42. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of dyadic relations between concepts; defining a set of rules that allow extraction of relations between concepts; receiving a corpus containing documents; parsing the documents to identify concepts; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents; incorporating the CRCs into a data organization; accepting a browsing request specifying a class of CRCs to browse; in response to the browsing request, extracting from the data organization a set of CRCs that match the class of CRCs; and displaying the results in a hypertext display of active information nodes to allow user to explore a broad idea rather than create a W-H query.
-
-
43. A computer-implemented method of preparing a set of documents to support information extraction, the method comprising:
-
defining a set of concept categories; defining a set of dyadic relations between concepts; defining a set of rules that allow extraction of relations between concepts; receiving a corpus containing documents; parsing the documents to identify concepts, including identifying idioms; extracting, by applying the set of rules to the parsed documents, concept-relation-concept triples, referred to as CRCs, from the parsed documents; and incorporating the CRCs into a data organization.
-
Specification