Scalable cloud-based time series analysis
First Claim
1. A system, comprising:
- one or more data processors; and
a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations including;
receiving a script at each of a plurality of grid-computing devices;
compiling the script on each of the plurality of grid-computing devices, wherein compiling a script on a grid-computing device comprises compiling the script for an operating system of the grid-computing device;
reading input data in parallel by the plurality of grid-computing devices, wherein the input data comprises timestamped data partitionable into groups according to time series criteria;
deterministically distributing the timestamped data across the plurality of grid-computing devices based on the groups, wherein, for each of the groups, the timestamped data associated with the group is associated with one of the plurality of grid-computing devices;
generating a time series for each of the groups at respective ones of the plurality of grid-computing devices, wherein generating a time series comprises accumulating the timestamped data associated with a group into the time series;
executing the compiled script, at each of the plurality of grid-computing devices, on the time series associated with the grid-computing device to generate output data; and
writing the output data in parallel by the plurality of grid-computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Timestamped data can be read in parallel by multiple grid-computing devices. The timestamped data, which can be partitioned into groups based on time series criteria, can be deterministically distributed across the multiple grid-computing devices based on the time series criteria. Each grid-computing device can sort and accumulate the timestamped data into a time series for each group it receives and then process the resultant time series based on a previously distributed script, which can be compiled at each grid-computing device, to generate output data. The grid-computing devices can write their output data in parallel. As a result, vast amounts of timestamped data can be easily analyzed across an easily expandable number of grid-computing devices with reduced computational expense.
-
Citations
30 Claims
-
1. A system, comprising:
-
one or more data processors; and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations including; receiving a script at each of a plurality of grid-computing devices; compiling the script on each of the plurality of grid-computing devices, wherein compiling a script on a grid-computing device comprises compiling the script for an operating system of the grid-computing device; reading input data in parallel by the plurality of grid-computing devices, wherein the input data comprises timestamped data partitionable into groups according to time series criteria; deterministically distributing the timestamped data across the plurality of grid-computing devices based on the groups, wherein, for each of the groups, the timestamped data associated with the group is associated with one of the plurality of grid-computing devices; generating a time series for each of the groups at respective ones of the plurality of grid-computing devices, wherein generating a time series comprises accumulating the timestamped data associated with a group into the time series; executing the compiled script, at each of the plurality of grid-computing devices, on the time series associated with the grid-computing device to generate output data; and writing the output data in parallel by the plurality of grid-computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method, comprising:
-
receiving a script at each of a plurality of grid-computing devices; compiling the script on each of the plurality of grid-computing devices, wherein compiling a script on a grid-computing device comprises compiling the script for an operating system of the grid-computing device; reading input data in parallel by the plurality of grid-computing devices, wherein the input data comprises timestamped data partitionable into groups according to time series criteria; deterministically distributing the timestamped data across the plurality of grid-computing devices based on the groups, wherein, for each of the groups, the timestamped data associated with the group is associated with one of the plurality of grid-computing devices; generating a time series for each of the groups at respective ones of the plurality of grid-computing devices, wherein generating a time series comprises accumulating the timestamped data associated with a group into the time series; executing the compiled script, at each of the plurality of grid-computing devices, on the time series associated with the grid-computing device to generate output data; and writing the output data in parallel by the plurality of grid-computing devices. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause a data processing apparatus to perform operations including:
-
receiving a script at each of a plurality of grid-computing devices; compiling the script on each of the plurality of grid-computing devices, wherein compiling a script on a grid-computing device comprises compiling the script for an operating system of the grid-computing device; reading input data in parallel by the plurality of grid-computing devices, wherein the input data comprises timestamped data partitionable into groups according to time series criteria; deterministically distributing the timestamped data across the plurality of grid-computing devices based on the groups, wherein, for each of the groups, the timestamped data associated with the group is associated with one of the plurality of grid-computing devices; generating a time series for each of the groups at respective ones of the plurality of grid-computing devices, wherein generating a time series comprises accumulating the timestamped data associated with a group into the time series; executing the compiled script, at each of the plurality of grid-computing devices, on the time series associated with the grid-computing device to generate output data; and writing the output data in parallel by the plurality of grid-computing devices. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification