LEARNING SYNTACTIC PATTERNS FOR AUTOMATIC DISCOVERY OF CAUSAL RELATIONS FROM TEXT
First Claim
1. A computer-based method for extracting relationships from textual data, comprising the steps of:
- receiving, from a first distributed data source, training data comprising three or more words describing relationships between an action and an object;
collecting textual data including the received training data;
extracting a syntactic pattern from the collected textual data;
scanning a second distributed data source;
extracting target relationships from the second distributed data source using the extracted syntactic pattern; and
storing the target relationships in a computer storage media.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a method for extracting relationships between words in textual data. Initially, training relationship data, such as word triplets describing a cause-effect relationship, is received and used to collect additional textual data including the training relationship data. Distributed data collection is used to receive the training data and collect the additional textual data, allowing a broad range of data to be acquired from multiple sources. Syntactic patterns are extracted from the additional textual data and a distributed data source is scanned to extract additional relationship data describing one or more causal relationships using the extracted syntactic patterns. The extracted additional relationship data is then stored, and can be validated by a supervised learning algorithm before storage and used to train a classifier for automatic validation of additional relationship data.
-
Citations
15 Claims
-
1. A computer-based method for extracting relationships from textual data, comprising the steps of:
-
receiving, from a first distributed data source, training data comprising three or more words describing relationships between an action and an object; collecting textual data including the received training data; extracting a syntactic pattern from the collected textual data; scanning a second distributed data source; extracting target relationships from the second distributed data source using the extracted syntactic pattern; and storing the target relationships in a computer storage media. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for extracting relationships from textual data, the system comprising:
-
an input device for receiving training data and textual data; a data store, adapted to communicate with the input device, the data store for storing representations of the received training data; an extraction module, adapted to communicate with the data store and the input device for extracting a syntactic pattern from the textual data using the received training data; a classifier adapted to communicate with the extraction module and the data store for classifying relationships described by the extracted syntactic pattern. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A computer program product, comprising a computer readable medium storing computer executable code for extracting relationships from textual data, the computer executable code performing the steps of:
-
receiving, from a first distributed data source, training data comprising three or more words describing relationships between an action and an object; collecting textual data including the received training data; extracting a syntactic pattern from the collected textual data; scanning a second distributed data source; extracting target relationship from the second distributed data source using the extracted syntactic pattern; and storing the target relationships in a computer storage media.
-
-
14. The computer program product of claim 19, wherein the step of extracting a syntactic pattern comprises the steps of:
generating a dependency tree, the dependency tree describing relationships between words of the textual data; and
-
15. The computer program product of claim 18, wherein the step of extracting a syntactic pattern further comprises the steps of:
-
preprocessing the textual data to resolve pronouns; and inserting satellite links to the dependency tree.
-
Specification