Joining semantically-related data using big table corpora
First Claim
Patent Images
1. A system for performing semantic join operations on data in different representations, said system comprising:
- a memory area associated with a computing device, said memory area storing a plurality of tables having columns of values; and
a processor programmed to;
receive a request to perform a semantic join operation on at least two of the tables stored in the memory area;
in response to the received request, identify pairs of values from the at least two tables, the pairs of values including one value from a column in a first one of the tables and one value from a column in a second one of the tables;
determine, based on historical co-occurrence data, statistical co-occurrence scores for the identified pairs of values, wherein the statistical co-occurrence scores for the identified pairs of values are based on a row-level statistical co-occurrence score and a column-level statistical co-occurrence score;
infer a join relationship between the at least two tables using the statistical co-occurrence scores by generating a maximum aggregate correlation using the statistical co-occurrence scores; and
perform a semantic loin operation between the at least two tables using the statistical co-occurrence scores.
1 Assignment
0 Petitions
Accused Products
Abstract
Examples of the disclosure enable performing semantic joins using a big table corpus. Pairs of values from at least two data sets are identified. The pairs of values include one value from a first one of the data sets and one value from a second one of the data sets. Statistical co-occurrence scores for the identified pairs of values are determined based on historical co-occurrence data. The determined statistical co-occurrence scores are used for predicting a semantic relationship between the at least two data sets. The predicted semantic relationship is used for joining the at least two data sets.
-
Citations
20 Claims
-
1. A system for performing semantic join operations on data in different representations, said system comprising:
-
a memory area associated with a computing device, said memory area storing a plurality of tables having columns of values; and a processor programmed to; receive a request to perform a semantic join operation on at least two of the tables stored in the memory area; in response to the received request, identify pairs of values from the at least two tables, the pairs of values including one value from a column in a first one of the tables and one value from a column in a second one of the tables; determine, based on historical co-occurrence data, statistical co-occurrence scores for the identified pairs of values, wherein the statistical co-occurrence scores for the identified pairs of values are based on a row-level statistical co-occurrence score and a column-level statistical co-occurrence score; infer a join relationship between the at least two tables using the statistical co-occurrence scores by generating a maximum aggregate correlation using the statistical co-occurrence scores; and perform a semantic loin operation between the at least two tables using the statistical co-occurrence scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
identifying pairs of values from at least two data sets, the pairs of values including one value from a first one of the data sets and one value from a second one of the data sets; determining, based on historical co-occurrence data, statistical co-occurrence scores for the identified pairs of values, wherein the statistical co-occurrence scores for the identified pairs of values are based on a row-level statistical co-occurrence score and a column-level statistical co-occurrence score; predicting, by a processor associated with a computing device, a semantic relationship between the at least two data sets using the determined statistical co-occurrence scores by generating a maximum aggregate correlation using the statistical co-occurrence scores to enable a semantic join operation between the at least two data sets; and performing a semantic loin operation between the at least two data sets using the determined statistical co-occurrence scores. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. One or more computer storage media embodying computer-executable components, said components comprising:
-
an identification component that on execution by at least one processor causes the at least one processor to identify pairs of values from at least two data sets, the pairs of values including one value from a first one of the data sets and one value from a second one of the data sets; a statistics serving component that on execution by at least one processor causes the at least one processor to calculate statistical co-occurrence scores for one or more of the identified pairs of values based on their strength of correlation in a big table corpus, wherein the statistical co-occurrence scores for the identified pairs of values are based on a row-level statistical co-occurrence score and a column-level statistical co-occurrence score; a join path calculation component that on execution by at least one processor causes the at least one processor to compute a join relationship between the one or more of the identified pairs of values using the statistical co-occurrence scores by generating a maximum aggregate correlation using the statistical co-occurrence scores; and a user interface component that on execution by at least one processor causes the at least one processor to present the computed join relationship to a user for performing semantic join of the at least two data sets, wherein the join path calculation component that on execution by the at least one processor causes the at least one processor to perform a semantic join operation between the at least two data sets using the statistical co-occurrence scores. - View Dependent Claims (19, 20)
-
Specification