Method and system for extracting and visualizing graph-structured relations from unstructured text
First Claim
1. A method for automatically extracting and mining relations and related entities from unstructured text, comprising:
- Receiving a query specifying a main entity; and
Extracting, using a computer processing system, from unstructured text relations and related entities related to the main entity specified in the query, the extracting further comprising;
Searching and selecting in the unstructured text, documents containing the main entity;
Attaching to each word of the selected documents, at least one tag, each tag being of a different type;
extracting relations and related entities by applying patterns to the tagged documents, wherein the patterns are induced from unstructured text, the inducing comprising;
attaching to each word of an unstructured text, at least one tag of a different type;
defining at least one template, each template being based on a sequence of tags;
and Generating from each template, at least one pattern, each pattern specifying a role for each tag in the template;
Extracting from the selected documents features characterizing each entity and relation; and
building a graph based on the extracted features, whose nodes represent the entities related to the specified main entity and whose edges represent the relations between the entities, wherein building a graph based on the extracted features further comprises expanding the graph based on distances between nodes, the expanding further comprising selecting at least one expanded entity, close to the main entity, and extracting from the unstructured text, relations and related entities related to each expanded entity;
wherein extracting features characterizing each relation from the selected documents further comprises associating each relation with a relation class, a relation strength, and temporal information, and wherein building the graph further comprises indicating on the graph which of the entities are related at a given time based on the temporal information for each relation.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is directed to a system, method and computer program for automatically extracting and mining relations and related entities from unstructured text. A method in accordance with an embodiment of the invention includes: extracting relations and related entities from unstructured text data, representing the extracted information into a graph, and manipulating the resulting graph to gain more insight into the information it contains. The extraction of relations and related entities is performed first by automatically inducting pattern and second by applying these induced patterns to unstructured text data. For each relation and entity, several features are extracted in order to build a graph whose nodes are entities and edges are relations.
104 Citations
16 Claims
-
1. A method for automatically extracting and mining relations and related entities from unstructured text, comprising:
-
Receiving a query specifying a main entity; and Extracting, using a computer processing system, from unstructured text relations and related entities related to the main entity specified in the query, the extracting further comprising; Searching and selecting in the unstructured text, documents containing the main entity; Attaching to each word of the selected documents, at least one tag, each tag being of a different type; extracting relations and related entities by applying patterns to the tagged documents, wherein the patterns are induced from unstructured text, the inducing comprising; attaching to each word of an unstructured text, at least one tag of a different type; defining at least one template, each template being based on a sequence of tags; and Generating from each template, at least one pattern, each pattern specifying a role for each tag in the template; Extracting from the selected documents features characterizing each entity and relation; and building a graph based on the extracted features, whose nodes represent the entities related to the specified main entity and whose edges represent the relations between the entities, wherein building a graph based on the extracted features further comprises expanding the graph based on distances between nodes, the expanding further comprising selecting at least one expanded entity, close to the main entity, and extracting from the unstructured text, relations and related entities related to each expanded entity; wherein extracting features characterizing each relation from the selected documents further comprises associating each relation with a relation class, a relation strength, and temporal information, and wherein building the graph further comprises indicating on the graph which of the entities are related at a given time based on the temporal information for each relation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for automatically extracting and mining relations and related entities from unstructured text, comprising:
-
a computer processing system, including; a system for receiving a query specifying a main entity; and a system for extracting from unstructured text relations and related entities related to the main entity specified in the query, the system for extracting further comprising; a system for searching and selecting in the unstructured text, documents containing the main entity; a system for attaching to each word of the selected documents, at least one tag, each tag being of a different type; a system for extracting relations and related entities by applying patterns to the tagged documents, the system for extracting relations and related entities comprising a system for inducing the patterns from unstructured text, the inducing comprising; attaching to each word of an unstructured text, at least one tag of a different type; defining at least one template, each template being based on a sequence of tags; and generating from each template, at least one pattern, each pattern specifying a role for each tag in the template; a system for extracting from the selected documents features characterizing each entity and relation; and a system for building a graph based on the extracted features, whose nodes represent the entities related to the specified main entity and whose edges represent the relations between the entities, wherein the system for building a graph based on the extracted features further comprises a system for expanding the graph based on distances between nodes, the expanding of the graph based on distances between nodes further comprising selecting at least one expanded entity, close to the main entity, and extracting from the unstructured text, relations and related entities related to each expanded entity; wherein the system for extracting features characterizing each relation from the selected documents further is configured to associate each relation with a relation class, a relation strength, and temporal information, and wherein the system for building the graph is configured to indicate on the graph which of the entities are related at a given time based on the temporal information for each relation.
-
-
16. A computer program stored on a computer readable medium for automatically extracting and mining relations and related entities from unstructured text, when the computer program is executed on a computer, the computer program comprising program code for:
-
receiving a query specifying a main entity; and extracting from unstructured text relations and related entities related to the main entity specified in the query, the extracting further comprising; searching and selecting in the unstructured text, documents containing the main entity; attaching to each word of the selected documents, at least one tag, each tag being of a different type; extracting relations and related entities by applying patterns to the tagged documents, wherein the patterns are induced from unstructured text, the inducing comprising;
attaching to each word of an unstructured text, at least one tag of a different type;
defining at least one template, each template being based on a sequence of tags; and
generating from each template, at least one pattern, each pattern specifying a role for each tag in the template;extracting from the selected documents features characterizing each entity and relation; and building a graph based on the extracted features, whose nodes represent the entities related to the specified main entity and whose edges represent the relations between the entities, wherein building a graph based on the extracted features further comprises expanding the graph based on distances between nodes, the expanding further comprising selecting at least one expanded entity, close to the main entity, and extracting from the unstructured text, relations and related entities related to each expanded entity; wherein extracting features characterizing each relation from the selected documents further comprises associating each relation with a relation class, a relation strength, and temporal information, and wherein building the graph further comprises indicating on the graph which of the entities are related at a given time based on the temporal information for each relation.
-
Specification