HIGH-PRECISION LIMITED SUPERVISION RELATIONSHIP EXTRACTOR
First Claim
1. A method for automatically extracting relationships from unstructured text, the method comprising:
- selecting a relationship type describing a relationship between a subject having an entity type and an object having an object type;
locating mentions of the object type in a selected document;
for each mention located in the selected document, predicting a probability that the mention satisfies the relationship type using a statistical model built using automatically labeled training data; and
extracting one or more relationships satisfying the relationship type from the selected document.
4 Assignments
0 Petitions
Accused Products
Abstract
Automatic relationship extraction is provided. A machine learning approach using statistical entity-type prediction and relationship predication models built from large unlabeled datasets is interactively combined with minimal human intervention and a light pattern-based approach to extract relationships from unstructured, semi-structured, and structured documents. Training data is collected from a collection of unlabeled documents by matching ground truths for a known entity from existing fact databases with text in the documents describing the known entity and corresponding models are built for one or more relationship types. For a modeled relationship-type, text chunks of interest are found in a document. A machine learning classifier predicts the probability that one of the text chunks is the entity being sought. The combined machine learning and light pattern-based approach provides both improved recall and high precision through filtering and allows constraining and normalization of the extracted relationships.
-
Citations
20 Claims
-
1. A method for automatically extracting relationships from unstructured text, the method comprising:
-
selecting a relationship type describing a relationship between a subject having an entity type and an object having an object type; locating mentions of the object type in a selected document; for each mention located in the selected document, predicting a probability that the mention satisfies the relationship type using a statistical model built using automatically labeled training data; and extracting one or more relationships satisfying the relationship type from the selected document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A relationship extractor implemented using a computer, the relationship extractor comprising:
-
a natural language processor operable to identify mentions of a subject of a selected subject type or objects of a selected object type specified in a selected relationship type appearing in a document describing the subject; a classifier operable to predict a probability that each object identified by the natural language processor satisfies the selected relationship type with the subject using a statistical model built from a large set of automatically labeled training data; and a post processor operable to aggregate objects associated with the selected relationship type, apply a pattern-based model to the aggregated objects, select one or more objects from the aggregated objects meeting selected criteria as a participants in relationships of the selected relationship type with the subject, and produce a final set of one or more relationships of the selected relationship type. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer readable medium containing computer executable instructions which, when executed by a computer, perform a method of extracting facts from free and semi-structured text using distant supervision, the method comprising:
-
collecting a known facts from an existing knowledge graph corresponding to a relationship type describing a relationship between a subject having an entity type and an object having an object type; automatically labeling training data extracted from documents corresponding to the known facts; training a statistical model with a large quantity of automatically labeled training data; displaying a small number of classification predictions generated using the automatically labeled training data for annotation by a user; retraining the statistical model based on the annotations received from the user; locating mentions of the object type in a selected document; predicting a probability that each mention satisfies the relationship type using the statistical model; and extracting one or more relationships satisfying the relationship type from the selected document.
-
Specification