Iterative generation of partial column schema
First Claim
Patent Images
1. A system comprising:
- one or more storage devices comprising one or more database files configured to maintain a set of items comprising a key and one or more values associated with the key; and
one or more memories having stored thereon computer-readable instructions that upon execution cause the system at least to;
extract a set of values from the set of items;
generate a set of column names from the set of values;
assign a plurality of values from the set of items to a plurality of column names in the set of column names;
recursively reassign a first value to other column names in the plurality of column names based at least in part on a semantic fit quality and a utilization quality;
wherein the semantic fit quality is based at least in part on a solution constraint and semantic similarity of the first value to a column name to which the first value is assigned and the other values in the plurality of values assigned to the column name, the solution constraint based at least in part on the first value sharing a common key with a value from the set of items; and
wherein the utilization quality is based at least in part on a number of values currently assigned to the column name to which the first value is assigned and a comparison of the semantic fit quality to a prospective semantic fit quality.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for iteratively generating a partial column schema indicative of semantic relationships in a corpus of key-value data are disclosed. A set of textual values is extracted from a pre-existing corpus of key-value data and potential column names are generated. Value reassignment and potential column pruning proceeds based on semantic fit quality, potential column utilization and random factors influenced by a decreasing system temperature.
-
Citations
28 Claims
-
1. A system comprising:
-
one or more storage devices comprising one or more database files configured to maintain a set of items comprising a key and one or more values associated with the key; and one or more memories having stored thereon computer-readable instructions that upon execution cause the system at least to; extract a set of values from the set of items; generate a set of column names from the set of values; assign a plurality of values from the set of items to a plurality of column names in the set of column names; recursively reassign a first value to other column names in the plurality of column names based at least in part on a semantic fit quality and a utilization quality; wherein the semantic fit quality is based at least in part on a solution constraint and semantic similarity of the first value to a column name to which the first value is assigned and the other values in the plurality of values assigned to the column name, the solution constraint based at least in part on the first value sharing a common key with a value from the set of items; and wherein the utilization quality is based at least in part on a number of values currently assigned to the column name to which the first value is assigned and a comparison of the semantic fit quality to a prospective semantic fit quality. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer-readable storage medium having stored thereon instructions that, upon execution by a computing device, cause the computing device at least to:
-
construct a first association between a first value extracted from a set of items and a first column name selected from a set of textual values that are ontologically related to one or more values in the set of items, the set of items comprising a key and one or more values associated with the key; select the first value for association with a different column name based at least in part on a semantic fit quality for the first value; and construct a second association between the first value and a second column name, the second column name selected by traversing a graph representative of ontological relationships between the first column name and one or more additional column names. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for grouping semantically related data stored in a database management system, the method comprising:
-
constructing a first association between a first value extracted from a set of items and a first column name selected from a set of textual values that are ontologically related to at least one value from the set of items, the set of items comprising a key and one or more values associated with the key; selecting the first value for association with a different column name based at least in part on a semantic fit quality for the first value; constructing a second association between the first value and a second column name selected by traversing a graph representative of ontological relationships between the second column name and one or more additional column names; and determining that a utilization quality for the first column name is less than a utilization quality corresponding to the second column name. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
-
24. A system comprising:
-
one or more storage devices comprising one or more database files configured to maintain a set of items comprising a key and one or more values associated with the key; and one or more memories having stored thereon computer-readable instructions that upon execution cause the system at least to; assign a first value corresponding to a value in the set of items to a first column name from a set of column names; select the first value for reassignment, the selection based at least in part on a first degree of semantic similarity between the first value and other values currently assigned to the first column name; determine a second degree of semantic similarity between the first value and other values currently assigned to a second column name; and assign the first value to the second column name, the second column name selected for assignment based at least in part on the second degree of semantic similarity. - View Dependent Claims (25, 26, 27, 28)
-
Specification