Scalable analysis platform for semi-structured data
First Claim
1. A method of operating a data analysis system, the method comprising:
- retrieving objects from a semi-structured data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data;
dynamically creating a cumulative schema wherein said dynamically creating comprises, for each object of the retrieved objects;
(i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object,(ii) performing a union of a set of fields of the inferred schema and another set of fields of the cumulative schema to create a unified schema that describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and(iii) storing the unified schema as the cumulative schema;
converting fields in the cumulative schema into respective columns in a relational schema; and
storing the data of each of the retrieved objects in a relational database according to the relational schema.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of operating a query system includes retrieving objects from a data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. The method further includes dynamically creating a cumulative schema. The dynamically creating includes, for each object of the retrieved objects, (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, (ii) creating a unified schema, and (iii) storing the unified schema as the cumulative schema. The unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema. The method further includes storing the data of each of the retrieved objects in a storage service.
-
Citations
23 Claims
-
1. A method of operating a data analysis system, the method comprising:
-
retrieving objects from a semi-structured data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically creating a cumulative schema wherein said dynamically creating comprises, for each object of the retrieved objects; (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, (ii) performing a union of a set of fields of the inferred schema and another set of fields of the cumulative schema to create a unified schema that describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting fields in the cumulative schema into respective columns in a relational schema; and storing the data of each of the retrieved objects in a relational database according to the relational schema. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer-readable medium storing processor-executable instructions to perform:
-
retrieving objects from a semi-structured data source, wherein each of the retrieved objects includes (i) data and (ii) metadata describing the data; dynamically creating a cumulative schema wherein said dynamically creating comprises, for each object of the retrieved objects; (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, (ii) performing a union of a set of fields of the inferred schema and another set of fields of the cumulative schema to create a unified schema that describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema; converting fields in the cumulative schema into respective columns in a relational schema; and storing the data of each of the retrieved objects in a relational database according to the relational schema.
-
Specification