Grouping interdependent fields
First Claim
1. A methodfor determining which fields, of a plurality of fields in a data set, are interdependent, comprising:
- determining different levels of interdependence between different pairs of fields of the plurality of fields at least in part by, for each pair of the different pairs of fields;
for a respective sample of the data set in said each pair;
computing respective measured frequencies of co-occurrences of values in said each pair,computing respective predicted frequencies of co-occurrences of values in said each pair based at least in part on measured frequencies of values in each separate field of said each pair;
computing a respective divergence score that measures how divergent values are in the separate fields of said each pair based at least in part on the respective measured frequencies of co-occurrences and the respective predicted frequencies of co-occurrences;
determining a maximum divergence score among respective divergence scores computed by said computing a respective divergence score for each pair of said different pairs;
scaling the respective divergence score based at least in part on the maximum divergence score to a respective scaled divergence score that is different from the respective divergence score; and
identifying a group of fields as interdependent based on said levels of interdependence between different pairs of fields.
1 Assignment
0 Petitions
Accused Products
Abstract
Processes, machines, and stored machine instructions are provided for grouping interdependent fields. Field grouping logic may include specially configured machines and/or stored instructions that identify group(s) of interdependent fields of a data set. The field grouping logic may receive, from a client on a customizable interface, a request for interdependent fields in a data set and, in response, cause generation of an output object that identifies the similar fields in the data set. The field grouping logic may exclude field(s) of the data set that are not interdependent, are not frequently accessed, or do not consume much space in storage, even though the request may not identify which fields are interdependent. The output object identifies the similar fields in set(s) or list(s) of fields, or in a hierarchy or hierarchies of groups and sub-groups.
-
Citations
28 Claims
-
1. A method
for determining which fields, of a plurality of fields in a data set, are interdependent, comprising: -
determining different levels of interdependence between different pairs of fields of the plurality of fields at least in part by, for each pair of the different pairs of fields; for a respective sample of the data set in said each pair; computing respective measured frequencies of co-occurrences of values in said each pair, computing respective predicted frequencies of co-occurrences of values in said each pair based at least in part on measured frequencies of values in each separate field of said each pair; computing a respective divergence score that measures how divergent values are in the separate fields of said each pair based at least in part on the respective measured frequencies of co-occurrences and the respective predicted frequencies of co-occurrences; determining a maximum divergence score among respective divergence scores computed by said computing a respective divergence score for each pair of said different pairs; scaling the respective divergence score based at least in part on the maximum divergence score to a respective scaled divergence score that is different from the respective divergence score; and identifying a group of fields as interdependent based on said levels of interdependence between different pairs of fields. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory storage media storing instructions which, when executed by one or more computing devices, cause
determining which fields, of a plurality of fields in a data set, are interdependent and comprise instructions which, when executed by the one or more computing devices further cause performance of steps comprising: -
determining different levels of interdependence between different pairs of fields of the plurality of fields at least in part by, for each pair of the different pairs of fields; for a respective sample of the data set in said each pair; computing respective measured frequencies of co-occurrences of values in said each pair, computing respective predicted frequencies of co-occurrences of values in said each pair based at least in part on measured frequencies of values in each separate field of said each pair; computing a respective divergence score that measures how divergent values are in the separate fields of said each pair based at least in part on the respective measured frequencies of co-occurrences and the respective predicted frequencies of co-occurrences; determining a maximum divergence score among respective divergence scores computed by said computing a respective divergence score for each pair of said different pairs; scaling the respective divergence score based at least in part on the maximum divergence score to a respective scaled divergence score that is different from the respective divergence score; and identifying a group of fields as interdependent based on said levels of interdependence between different pairs of fields. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification