Dataset analysis and dataset attribute inferencing to form collaborative datasets
First Claim
1. A method comprising:
- receiving data representing a plurality of datasets each dataset having a different data format into a dataset ingestion controller configured to form a collaborative dataset, at least one dataset being associated with an identifier;
interpreting a subset of data of the dataset against one or more data classifications at an inference engine to derive at least an inferred attribute for the subset of data;
identifying that the subset of data of the at least one dataset omits an association to annotative data representing an annotative description;
deducing a data classification for the subset of data of the dataset to form a deduced data classification;
analyzing other annotative data associated with equivalently classified data in other datasets;
associating the subset of the data with the annotative data based on the deduced data classification to identify the inferred attribute;
converting the plurality of datasets from the different data formats at a format converter to form an atomized dataset having a specific format;
identifying a subset of datasets;
linking a subset of atomized data points in the atomized dataset to the subset of datasets;
forming an enriched graph data structure accessible in association with the identifier;
deriving a supplemental subset of data based on one or more subsets of the at least one dataset;
associating the supplemental subset of data to the at least one dataset; and
converting the supplemental subset of data to form a supplemental atomized dataset including the atomized dataset.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments relate generally to data science and data analysis, computer software and systems, and wired and wireless network communications to provide an interface between repositories of disparate datasets and computing machine-based entities that seek access to the datasets, and, more specifically, to a computing and data storage platform that facilitates consolidation of one or more datasets, whereby a collaborative data layer and associated logic facilitate, for example, efficient access to, and implementation of, collaborative datasets. In some examples, a method may include receiving a dataset having a data format into a dataset ingestion controller configured to form a collaborative dataset, interpreting data of the dataset against data classifications at an inference engine to derive at least an inferred attribute, associating the data with annotative data identifying the inferred attribute, and converting the dataset at a format converter to form an atomized dataset.
223 Citations
17 Claims
-
1. A method comprising:
-
receiving data representing a plurality of datasets each dataset having a different data format into a dataset ingestion controller configured to form a collaborative dataset, at least one dataset being associated with an identifier; interpreting a subset of data of the dataset against one or more data classifications at an inference engine to derive at least an inferred attribute for the subset of data; identifying that the subset of data of the at least one dataset omits an association to annotative data representing an annotative description; deducing a data classification for the subset of data of the dataset to form a deduced data classification; analyzing other annotative data associated with equivalently classified data in other datasets; associating the subset of the data with the annotative data based on the deduced data classification to identify the inferred attribute; converting the plurality of datasets from the different data formats at a format converter to form an atomized dataset having a specific format; identifying a subset of datasets; linking a subset of atomized data points in the atomized dataset to the subset of datasets; forming an enriched graph data structure accessible in association with the identifier; deriving a supplemental subset of data based on one or more subsets of the at least one dataset; associating the supplemental subset of data to the at least one dataset; and converting the supplemental subset of data to form a supplemental atomized dataset including the atomized dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method comprising:
-
receiving data representing a plurality of datasets, each dataset having a different data format, into a dataset ingestion controller configured to form a collaborative dataset, at least one dataset being associated with a user account identifier; converting the plurality of datasets from the different data formats at a format converter to form an atomized dataset having a specific format; identifying a subset of datasets that includes portions that are associated with at least a portion of the dataset; identifying that the portion of the dataset omits an association to annotative data representing an annotative description; deducing a data classification for the portion of the dataset to form a deduced data classification; analyzing other annotative data associated with equivalently classified data in other datasets; linking at a data enrichment manager a subset of atomized data points associated with the subset of datasets to atomized data points of the atomized dataset based on the deduced data classification; converting the subset of datasets including the portion of the dataset to form a subset of atomized datasets having the specific format; forming a graph data structure accessible in association with the user account identifier, the graph data structure including data from the atomized dataset and from the subset of atomized datasets; identifying the subset of datasets; linking the subset of atomized data points in the atomized dataset to the subset of datasets; forming an enriched graph data structure accessible in association with the user account identifier; deriving a supplemental subset of data based on one or more subsets of the at least one dataset; associating the supplemental subset of data to the at least one dataset; and converting the supplemental subset of data to form a supplemental atomized dataset including the atomized dataset. - View Dependent Claims (16, 17)
-
Specification