Automated database schema annotation
First Claim
1. A device comprising:
- a processor; and
a computer-readable medium including modules, the modules, when executed by the processor, configure the device to generate annotations, the modules comprising;
a column discovery module configured to retrieve a table; and
a column annotation module configured to annotate a target column of a target table from a target database by;
calculating a value-related score between the target column of the target table and a column of the table, the value-related score based at least in part on similarities between one or more values in the target column of the target table and one or more column values extracted from the column of the table, the value-related score being a numerical value-related score;
calculating a context-related score between the target column of the target table and the column of the table, the context-related score based at least in part on similarities between identities of one or more columns of the target table and column identities of one or more columns of the table, the context-related score being a numerical context-related score;
calculating a similarity score based on a numerical value comprising a numerical combination of the value-related score and the context-related score, the similarity score being a numerical similarity score; and
annotating, based at least in part on the similarity score, the target column of the target table using a column identity of the column of the table.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques and constructs that improve annotating target columns of a target database by performing automated annotation of the target columns using sources. The techniques include calculating a similarity score between a target column and columns extracted from a table that is included in a source. The similarity score is calculated based at least in part on a similarity between a value in the target column of the target database and a column value of the extracted column from the table and on a similarity between an identity of the target column of the target database and column identities of the extracted columns from the table. In some examples, the techniques calculate similarity scores for one or more extracted columns and annotate the target column based on the similarity scores.
-
Citations
20 Claims
-
1. A device comprising:
-
a processor; and a computer-readable medium including modules, the modules, when executed by the processor, configure the device to generate annotations, the modules comprising; a column discovery module configured to retrieve a table; and a column annotation module configured to annotate a target column of a target table from a target database by; calculating a value-related score between the target column of the target table and a column of the table, the value-related score based at least in part on similarities between one or more values in the target column of the target table and one or more column values extracted from the column of the table, the value-related score being a numerical value-related score; calculating a context-related score between the target column of the target table and the column of the table, the context-related score based at least in part on similarities between identities of one or more columns of the target table and column identities of one or more columns of the table, the context-related score being a numerical context-related score; calculating a similarity score based on a numerical value comprising a numerical combination of the value-related score and the context-related score, the similarity score being a numerical similarity score; and annotating, based at least in part on the similarity score, the target column of the target table using a column identity of the column of the table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A processor implemented method comprising:
-
retrieving a table, under control of one or more processors; calculating, using the one or more processors, a value-related score between a target column of a target table from a target database and a column of the table, the value-related score based at least in part on similarities between one or more values in the target column of the target table and one or more column values extracted from the column of the table, the value-related score being a numerical value-related score; calculating, using the one or more processors, a context-related score between the target column of the target table and the column of the table, the context-related score based at least in part on similarities between identities of one or more columns of the target table and column identities of one or more columns of the table, the context-related score being a numerical context-related score; calculating, using the one or more processors, a similarity score based on a numerical value comprising a numerical combination of the value-related score and the context-related score, the similarity score being a numerical similarity score; annotating, using the one or more processors and based at least in part on the similarity score, the target column of the target table using a column identity of the column of the table; and storing, using the one or more processors, the annotated target column. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer storage medium having computer-executable instructions to program a computer to perform operations comprising:
-
performing receiving a table; identifying a column included in the table; identifying a target column in a target table from a target database; calculating a value-related score between the target column of the target table and the column of the table, the value-related score based at least in part on similarities between one or more values in the target column of the target table and one or more column values extracted from the column of the table, the value-related score being a numerical value-related score; calculating a context-related score between the target column of the target table and the column of the table, the context-related score based at least in part on similarities between identities of one or more columns of the target table and column identities of one or more columns of the table, the context-related score being a numerical context-related score; calculating a similarity score based on a numerical value comprising a numerical combination of the value-related score and the context-related score, the similarity score being a numerical similarity score; and annotating, based at least in part on the similarity score, the target column included in the target table using an identity of the column included in the table. - View Dependent Claims (18, 19, 20)
-
Specification