Scalable analysis platform for semi-structured data
First Claim
1. A method of operating a data analysis system, the method comprising:
- retrieving objects from a data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data;
dynamically updating a cumulative schema, wherein said dynamically updating comprises, for each object of the retrieved objects;
(i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, wherein, for at least one object of the objects, a structure of an inferred schema is different from another structure of another inferred schema for another object of the objects,(ii) creating a unified schema based at least on a portion of the inferred schema for the object, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and(iii) storing the unified schema as the cumulative schema;
converting the cumulative schema into a relational schema; and
exporting, according to the relational schema, the data of each of the retrieved objects to a data warehouse.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of operating a data analysis system includes retrieving objects from a data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. The method further includes dynamically creating a cumulative schema by, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, (ii) creating a unified schema, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema. The method further includes exporting the data of each of the retrieved objects to a data warehouse.
-
Citations
18 Claims
-
1. A method of operating a data analysis system, the method comprising:
-
retrieving objects from a data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically updating a cumulative schema, wherein said dynamically updating comprises, for each object of the retrieved objects; (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, wherein, for at least one object of the objects, a structure of an inferred schema is different from another structure of another inferred schema for another object of the objects, (ii) creating a unified schema based at least on a portion of the inferred schema for the object, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting the cumulative schema into a relational schema; and exporting, according to the relational schema, the data of each of the retrieved objects to a data warehouse. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable medium storing processor-executable instructions comprising:
-
retrieving objects from a data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically updating a cumulative schema, wherein said dynamically updating comprises, for each object of the retrieved objects; (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, wherein, for at least one object of the objects, a structure of an inferred schema is different from another structure of another inferred schema for another object of the objects, (ii) creating a unified schema based at least on a portion of the inferred schema for the object, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting the cumulative schema into a relational schema; and exporting, according to the relational schema, the data of each of the retrieved objects to a data warehouse. - View Dependent Claims (17, 18)
-
Specification