HYBRID LOCAL/REMOTE INFRASTRUCTURE FOR DATA PROCESSING WITH LIGHTWEIGHT SETUP, POWERFUL DEBUGGABILITY, CONTROLLABILITY, INTEGRATION, AND PRODUCTIVITY FEATURES
First Claim
1. A method for providing an entirely hosted MapReduce data analytic infrastructure for executing a distributed large-scale compute job, the method comprising:
- providing complete portability with respect to a programmer'"'"'s local machine, operating system, and browser;
continuous metering of consumed compute, storage, and network resources on a hosting system;
continuously reporting cost metering to a user;
for each compute instance of the distributed large-scale compute job, monitoring of input and output data for the compute instance, and logging of the input and output data for the compute instance;
generating input data from a kernel program, wherein the kernel program is smaller than the generated input data; and
data-sorting at least a portion of the output data of the distributed large-scale compute job during a Map phase, wherein the data-sorting comprises,indexed writing of data sorting keys to shared storage by independently-executed Map jobs,wherein the hosted infrastructure does not require an installed client software or software development kit and wherein the hosted infrastructure does not require the compilation of computer code.
1 Assignment
0 Petitions
Accused Products
Abstract
The disclosed technology provides a hybrid local/remote hosted MapReduce framework and infrastructure comprising systems and methods for improving setup, configuration, controllability, debuggability, and integration of a compute job and systems and methods for increasing programmer productivity. The system applies an interpreted programming language for the programmer'"'"'s custom Map and Reduce algorithms, such that those algorithms can execute identically on both the hosted service as well as locally (e.g., on the programmer'"'"'s local computing system or device) for development and debugging purposes. Furthermore, the disclosed system delivers this service—a hosted MapReduce infrastructure—in a simple and transparent web service.
-
Citations
10 Claims
-
1. A method for providing an entirely hosted MapReduce data analytic infrastructure for executing a distributed large-scale compute job, the method comprising:
-
providing complete portability with respect to a programmer'"'"'s local machine, operating system, and browser; continuous metering of consumed compute, storage, and network resources on a hosting system; continuously reporting cost metering to a user; for each compute instance of the distributed large-scale compute job, monitoring of input and output data for the compute instance, and logging of the input and output data for the compute instance; generating input data from a kernel program, wherein the kernel program is smaller than the generated input data; and data-sorting at least a portion of the output data of the distributed large-scale compute job during a Map phase, wherein the data-sorting comprises, indexed writing of data sorting keys to shared storage by independently-executed Map jobs, wherein the hosted infrastructure does not require an installed client software or software development kit and wherein the hosted infrastructure does not require the compilation of computer code. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for implementing a hybrid local/remote MapReduce large scale data analytic infrastructure, the method comprising:
-
exposing an identical execution environment within both a large-scale hosted service as on a developer'"'"'s local machine; allowing for iterative local or in-browser development and testing of a programmer'"'"'s custom Map and Reduce algorithms; and allowing the developer to debug any particular Map, Reduce, or related compute instance, by automatically fetching its execution environment and replicating it on the developer'"'"'s local machine or in the developer'"'"'s browser. - View Dependent Claims (7, 8)
-
-
9. A computer-readable memory storing instructions that, if executed by a computing system, cause the computing system to perform a method for providing an entirely hosted data analytic infrastructure for executing a distributed large-scale compute job, the method comprising:
-
providing complete portability with respect to a programmer'"'"'s local machine, operating system, and browser; continuous metering of consumed compute, storage, and network resources on a hosting system; continuously reporting cost metering to a user; for each compute instance of the distributed large-scale compute job, monitoring of input and output data for the compute instance, and logging of the input and output data for the compute instance; generating input data from a kernel program, wherein the kernel program is smaller than the generated input data; and data-sorting at least a portion of the output data of the distributed large-scale compute job during a Map phase, wherein the data-sorting comprises, indexed writing of data sorting keys to shared storage by independently-executed Map jobs, wherein the hosted infrastructure does not require an installed client software or software development kit and wherein the hosted infrastructure does not require the compilation of computer code. - View Dependent Claims (10)
-
Specification