Knowledge discovery tool extraction and integration
First Claim
Patent Images
1. A computer system comprising:
- a processor; and
memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to perform operations comprising;
providing, in a database, a plurality of entity table types relevant to a domain type, wherein each entity table type includes a respective entity table name field and at least one respective data item field;
populating the respective entity table name fields and the respective data item fields by automatically;
extracting a plurality of data items from a plurality of data sources;
determining respective confidence values for the extracted data items based on a plurality of pre-determined confidence values, wherein the pre-determined confidence values, wherein a respective predetermined confidence value is provided for each respective entity table name and each respective data item according to which data source of the plurality of data sources provided the respective entity table name and the respective data item; and
integrating the extracted data items into respective entity table name fields and data item fields based on the respective confidence values for the extracted data items;
determining relationship types between at least some data items of the plurality of data items that are integrated into a plurality of tables in the database, wherein the relationship types are based on respective data sources from which the data items have been extracted, the relationship types including at least two of the following;
a first relationship type, a second relationship type, a third relationship type, and a fourth relationship type, wherein;
the first relationship type is between first and second data items extracted from a first structured data source,the second relationship type is between data items extracted from different structured data sources, wherein the second relationship type is based on a plurality of respective first relationship types,the third relationship type is between a data item extracted from a second structured data source, and an entity table including, as entity table data items, attributes about a first unstructured data source, and wherein the data item extracted from the second structured data source is mentioned in the first unstructured data source, and the fourth relationship type is between two data items that;
have been integrated from different structured data sources, and are both mentioned in a second unstructured data source; and
storing the relationship types in a computer system memory.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for integrating a data item into a knowledge model is provided. The method may include retrieving the data item from a data source, determining if the data item has been previously integrated into the knowledge model, and integrating the data element into the knowledge model if the data item has not been previously integrated.
159 Citations
20 Claims
-
1. A computer system comprising:
-
a processor; and
memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to perform operations comprising;providing, in a database, a plurality of entity table types relevant to a domain type, wherein each entity table type includes a respective entity table name field and at least one respective data item field; populating the respective entity table name fields and the respective data item fields by automatically; extracting a plurality of data items from a plurality of data sources; determining respective confidence values for the extracted data items based on a plurality of pre-determined confidence values, wherein the pre-determined confidence values, wherein a respective predetermined confidence value is provided for each respective entity table name and each respective data item according to which data source of the plurality of data sources provided the respective entity table name and the respective data item; and integrating the extracted data items into respective entity table name fields and data item fields based on the respective confidence values for the extracted data items; determining relationship types between at least some data items of the plurality of data items that are integrated into a plurality of tables in the database, wherein the relationship types are based on respective data sources from which the data items have been extracted, the relationship types including at least two of the following;
a first relationship type, a second relationship type, a third relationship type, and a fourth relationship type, wherein;the first relationship type is between first and second data items extracted from a first structured data source, the second relationship type is between data items extracted from different structured data sources, wherein the second relationship type is based on a plurality of respective first relationship types, the third relationship type is between a data item extracted from a second structured data source, and an entity table including, as entity table data items, attributes about a first unstructured data source, and wherein the data item extracted from the second structured data source is mentioned in the first unstructured data source, and the fourth relationship type is between two data items that; have been integrated from different structured data sources, and are both mentioned in a second unstructured data source; and storing the relationship types in a computer system memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer system comprising:
-
a processor; and memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to perform operations comprising; providing, in a database, a plurality of entity table types relevant to a domain type, wherein each entity table type includes a respective entity table name field and at least one respective data item field; populating the respective entity table name fields and the respective data item fields by automatically; extracting a plurality of data items from a plurality of data sources; determining respective confidence values for the extracted data items based on a plurality of pre-determined confidence values, wherein a respective predetermined confidence value is provided for each respective entity table name and each respective data item according to which data source of the plurality of data sources provided the respective entity table name and the respective data item; integrating the extracted data items into respective entity table name fields and data item fields based on the respective confidence values for the extracted data items; storing a first data item extracted from a first data source in a first data item field and storing a second data item extracted from the first data source in a second data item field; populating a first row of a direct relationship table having a plurality of rows with a first direct relationship definition indicating that the first data item field and the second data item field store respective data items that have been extracted from the first data source; searching the direct relationship table for a second row having a second direct relationship definition indicating that a third data item field stores a third data item that has been extracted from a second data source from which the first data item has also been extracted, wherein the second data source is different than the first data source; determining a transitive relationship definition, based on the first direct and second direct relationship definitions from the direct relationship table, indicating that the second data item field is related to the third data item field, wherein the transitive relationship definition is based on at least two separate relationships between data item fields; and storing the third relationship definition in a transitive relationship table. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification