Querying and integrating structured and unstructured data
First Claim
1. A computer-implemented method of querying and integrating structured and unstructured data, the method comprising:
- receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between first entities and second entities of the first set of unstructured data, and the relationship information stores semantics of verbs connecting the first entities and the second entities;
recognizing a pattern based on the relationship information by identifying repeated occasions of a first entity, a second entity and a verb connecting the first entity and the second entity in the first set of unstructured data;
creating a schema for the first set of unstructured data based on the pattern, wherein the schema comprises the first entity, the second entity and the verb as elements; and
associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data, thereby creating a link between the created schema element and either the second unstructured data entity or the schema element of the existing set of structured data.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity in-formation comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data.
34 Citations
19 Claims
-
1. A computer-implemented method of querying and integrating structured and unstructured data, the method comprising:
-
receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between first entities and second entities of the first set of unstructured data, and the relationship information stores semantics of verbs connecting the first entities and the second entities; recognizing a pattern based on the relationship information by identifying repeated occasions of a first entity, a second entity and a verb connecting the first entity and the second entity in the first set of unstructured data; creating a schema for the first set of unstructured data based on the pattern, wherein the schema comprises the first entity, the second entity and the verb as elements; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data, thereby creating a link between the created schema element and either the second unstructured data entity or the schema element of the existing set of structured data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented system for querying and integrating structured and unstructured data, system comprising:
-
A receiving device configured to receive entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between a first entities and a second entities of the first set of unstructured data, and the relationship information stores semantics of verbs connecting the first entities and the second entities; A pattern recognition device configured to recognize a pattern based on the relationship information by identifying repeated occasions of a first entity, a second entity and a verb connecting the first entity and the second in the first set of unstructured data; creating a schema for the first set of unstructured data based on the pattern, wherein the schema comprises the first entity, the second entity and the verb as elements; and an element association device configured to associate an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data, thereby creating a link between the created schema element and either the second unstructured data entity or the schema element of the existing set of structured data. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A computer program product for querying and integrating structured and unstructured data, the computer program product comprising:
-
receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between a first entities and a second entities of the first set of unstructured data, and the relationship information stores semantics of verbs connecting the first entities and the second entities; recognizing a pattern based on the relationship information by identifying repeated occasions of a first entity, a second entity and a verb connecting the first entity and the second entity in the first set of unstructured data; creating a schema for the first set of unstructured data based on the pattern, wherein the schema comprises the first entity, the second entity and the verb as elements; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data, thereby creating a link between the created schema element and either the second unstructured data entity or the schema element of the existing set of structured data. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification