Scalable analysis platform for semi-structured data
First Claim
Patent Images
1. A data transformation system comprising:
- one or more computing devices comprising one or more hardware processors and memory and configured to implement;
a schema inference module configured todynamically create a cumulative schema for objects retrieved from a first data source, wherein;
each of the retrieved objects includes (i) data and (ii) metadata describing the data; and
dynamically creating the cumulative schema includes, for each object of the retrieved objects, (i) inferring a schema from the object and (ii) selectively updating the cumulative schema to describe the object according to the inferred schema;
collect statistics on the data types of the retrieved objects; and
based on the statistics on the data types, determine whether the data of the retrieved objects is typed correctly; and
an export module configured to output the data of the retrieved objects to a data destination system according to the cumulative schema.
4 Assignments
0 Petitions
Accused Products
Abstract
A data transformation system includes a schema inference module and an export module. The schema inference module is configured to dynamically create a cumulative schema for objects retrieved from a first data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. Dynamically creating the cumulative schema includes, for each object of the retrieved objects, (i) inferring a schema from the object and (ii) selectively updating the cumulative schema to describe the object according to the inferred schema. The export module is configured to output the data of the retrieved objects to a data destination system according to the cumulative schema.
51 Citations
25 Claims
-
1. A data transformation system comprising:
one or more computing devices comprising one or more hardware processors and memory and configured to implement; a schema inference module configured to dynamically create a cumulative schema for objects retrieved from a first data source, wherein; each of the retrieved objects includes (i) data and (ii) metadata describing the data; and dynamically creating the cumulative schema includes, for each object of the retrieved objects, (i) inferring a schema from the object and (ii) selectively updating the cumulative schema to describe the object according to the inferred schema; collect statistics on the data types of the retrieved objects; and based on the statistics on the data types, determine whether the data of the retrieved objects is typed correctly; and an export module configured to output the data of the retrieved objects to a data destination system according to the cumulative schema. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
Specification