DETERMINING A DEGREE OF SIMILARITY OF A SUBSET OF TABULAR DATA ARRANGEMENTS TO SUBSETS OF GRAPH DATA ARRANGEMENTS AT INGESTION INTO A DATA-DRIVEN COLLABORATIVE DATASET PLATFORM
First Claim
1. A method comprising:
- identifying subsets of data as columnar data associated with a data arrangement, the data arrangement being a tabular data arrangement including each of the subsets of data as a column of data;
generating a similarity matrix of data associated with a subset of data for each column of data, the similarity matrix of data being configured to determine a degree of similarity to other datasets with which to join;
accessing a plurality of similarity matrices each formed to identify an amount of relevant data associated with a dataset disposed in a graph data arrangement;
analyzing the similarity matrix of data in view of the plurality of similarity matrices;
identifying a subset of the plurality of similarity matrices to form a subset of relevant similarity matrices;
generating links among the column of data and a subset of the other datasets associated with the subset of relevant similarity matrices; and
forming a subset of the links between the column of data and at least one of the other datasets.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments relate generally to data science and data analysis, computer software and systems, and wired and wireless network communications to interface among repositories of disparate datasets and computing machine-based entities configured to access datasets, and, more specifically, to a computing and data storage platform to determine degrees of similarity between at least a subset of data associated with an ingested dataset and one or more equivalent or similar subsets of data associated with one or more graph-based data arrangements, the degrees of similarity facilitating preferences or priorities in joining one or more graph-based data arrangements to the ingested dataset, according to at least some examples. For example, a method may include generating similarity matrices to join an ingested dataset (e.g., tabular dataset) to one or more graph-based datasets in accordance with determining a degree of similarity indication of a dataset with which to join.
-
Citations
20 Claims
-
1. A method comprising:
-
identifying subsets of data as columnar data associated with a data arrangement, the data arrangement being a tabular data arrangement including each of the subsets of data as a column of data; generating a similarity matrix of data associated with a subset of data for each column of data, the similarity matrix of data being configured to determine a degree of similarity to other datasets with which to join; accessing a plurality of similarity matrices each formed to identify an amount of relevant data associated with a dataset disposed in a graph data arrangement; analyzing the similarity matrix of data in view of the plurality of similarity matrices; identifying a subset of the plurality of similarity matrices to form a subset of relevant similarity matrices; generating links among the column of data and a subset of the other datasets associated with the subset of relevant similarity matrices; and forming a subset of the links between the column of data and at least one of the other datasets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. An apparatus comprising:
-
a memory including executable instructions; and a processor, responsive to executing the instructions, is configured to; identify subsets of data as columnar data associated with a data arrangement, the data arrangement being a tabular data arrangement including each of the subsets of data as a column of data; generate a similarity matrix of data associated with a subset of data for each column of data, the similarity matrix of data being configured to determine a degree of similarity to other datasets with which to join; access a plurality of similarity matrices each formed to identify an amount of relevant data associated with a dataset disposed in a graph data arrangement; analyze the similarity matrix of data in view of the plurality of similarity matrices; identify a subset of the plurality of similarity matrices to form a subset of relevant similarity matrices; generate links among the column of data and a subset of the other datasets associated with the subset of relevant similarity matrices; and form a subset of the links between the column of data and at least one of the other datasets. - View Dependent Claims (19, 20)
-
Specification