SYSTEM AND METHOD FOR OPERATING A BIG-DATA PLATFORM
First Claim
Patent Images
1. A method for operating a big-data platform comprising:
- at a data analysis platform, receiving discrete client data;
storing the client data in a network accessible distributed storage system that includes;
storing the client data in a real-time storage system in a row format;
merging the client data into a columnar-based distributed archive storage system;
receiving a data query through a query interface; and
processing the data query by selectively interfacing with the client data from the real-time storage system and archive storage system, according to a data mapping and reduction process, wherein processing the data query comprises cooperatively querying the real-time storage system and the archive storage system and distributing the data query over the real-time storage system and the archive storage system to retrieve a single cohesive query result,wherein merging the client data into a columnar-based distributed archive storage system comprises storing the client data in the archive storage system in a columnar format, andwherein interfacing with the client data from the archive storage system comprises;
converting, by using a query processing cluster, at least a portion of the data query to the mapping process and the reduction process; and
executing the mapping process and the reduction process by using the query processing cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for operating a big-data platform that includes at a data analysis platform, receiving discrete client data; storing the client data in a network accessible distributed storage system that includes: storing the client data in a real-time storage system; and merging the client data into a columnar-based distributed archive storage system; receiving a data query request through a query interface; and selectively interfacing with the client data from the real-time storage system and archive storage system according to the query.
-
Citations
24 Claims
-
1. A method for operating a big-data platform comprising:
-
at a data analysis platform, receiving discrete client data; storing the client data in a network accessible distributed storage system that includes; storing the client data in a real-time storage system in a row format; merging the client data into a columnar-based distributed archive storage system; receiving a data query through a query interface; and processing the data query by selectively interfacing with the client data from the real-time storage system and archive storage system, according to a data mapping and reduction process, wherein processing the data query comprises cooperatively querying the real-time storage system and the archive storage system and distributing the data query over the real-time storage system and the archive storage system to retrieve a single cohesive query result, wherein merging the client data into a columnar-based distributed archive storage system comprises storing the client data in the archive storage system in a columnar format, and wherein interfacing with the client data from the archive storage system comprises; converting, by using a query processing cluster, at least a portion of the data query to the mapping process and the reduction process; and executing the mapping process and the reduction process by using the query processing cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A method comprising:
- at a multi-tenant data analysis platform;
receiving discrete client data, the client data being associated with a user account of the multi-tenant data analysis platform through a unique identifier; storing the client data in a network accessible distributed storage system that includes a real-time storage system and a columnar-based distributed archive storage system, the storing of the client data comprising; storing the client data in the real-time storage system in a row format; merging the client data into the archive storage system in a columnar format, the client data merged into the archive data storage system being isolated according to the user account associated with the client data; receiving a data query through a query interface; and processing the data query by selectively interfacing with the client data from the real-time storage system and archive storage system, wherein processing the data query comprises cooperatively querying the real-time storage system and the archive storage system and distributing the data query over the real-time storage system and the archive storage system to retrieve a single cohesive query result, wherein interfacing with the client data from the archive storage system comprises; converting, by using a query processing cluster, the data query to a MapReduce mapping process and a MapReduce reduction process; and executing the MapReduce mapping process and the MapReduce reduction process by using the query processing cluster, and wherein the query processing duster includes a Hadoop enabled duster that is constructed to execute MapReduce processes.
- at a multi-tenant data analysis platform;
Specification