Behaviorally consistent cluster-wide data wrangling based on locally processed sampled data
First Claim
1. A method comprising:
- selecting, at a local client device, a first plurality of raw data from a second plurality of raw data, the second plurality of raw data being stored remote from the local client device and accessible by a remote device;
receiving the first plurality of raw data at the local client device;
selecting, from a library of data wrangling operations at the local client device, a plurality of data wrangling operations to perform on the first plurality of raw data;
instantiating, at the local client device, a first data wrangling module operating in a first mode selected from a first plurality of modes, the first mode being selected based on computing resources available to the local client device;
applying, using at least one hardware processor of the local client device, the plurality of data wrangling operations to the first plurality of raw data using the first mode of the first data wrangling module to obtain a first plurality of structured data;
causing presentation, via a user interface of the local client device, of the first plurality of structured data;
receiving, via the user interface of the local client device, after the causing of the presentation of the first plurality of structured data, an input indicating approval of the first plurality of structured data; and
sending, in response to receiving the input, the selection of the plurality of data wrangling operations to the remote device, the remote device being configured to;
instantiate a second data wrangling module operating in a second mode selected from a plurality of modes; and
apply the selected plurality of data wrangling operations to the second plurality of raw data using the second mode of the second data wrangling module to obtain a second plurality of structured data, the second plurality of structured data having an expected organization based on the first plurality of structured data.
1 Assignment
0 Petitions
Accused Products
Abstract
Example embodiments involve a system, computer-readable storage medium storing at least one program, and computer-implemented method for behaviorally consistent data wrangling. A local client device selects a set of raw sample data from a remote datastore. A local execution engine then applies one or more local data wrangling operations to the raw sample data. If the results of the local data wrangling operations are satisfactory, the local data wrangling operations may then be transferred to a remote data wrangling cluster. A remote execution engine being executed by the remote data wrangling cluster then applies the data wrangling operations to the larger set of raw data from which the sample raw data was obtained. As the remote execution engine and the local execution engine are of the same type, the data wrangling behavior exhibited by the local execution engine is reflected in the data wrangling behavior of the remote execution engine.
-
Citations
14 Claims
-
1. A method comprising:
-
selecting, at a local client device, a first plurality of raw data from a second plurality of raw data, the second plurality of raw data being stored remote from the local client device and accessible by a remote device; receiving the first plurality of raw data at the local client device; selecting, from a library of data wrangling operations at the local client device, a plurality of data wrangling operations to perform on the first plurality of raw data; instantiating, at the local client device, a first data wrangling module operating in a first mode selected from a first plurality of modes, the first mode being selected based on computing resources available to the local client device; applying, using at least one hardware processor of the local client device, the plurality of data wrangling operations to the first plurality of raw data using the first mode of the first data wrangling module to obtain a first plurality of structured data; causing presentation, via a user interface of the local client device, of the first plurality of structured data; receiving, via the user interface of the local client device, after the causing of the presentation of the first plurality of structured data, an input indicating approval of the first plurality of structured data; and sending, in response to receiving the input, the selection of the plurality of data wrangling operations to the remote device, the remote device being configured to; instantiate a second data wrangling module operating in a second mode selected from a plurality of modes; and apply the selected plurality of data wrangling operations to the second plurality of raw data using the second mode of the second data wrangling module to obtain a second plurality of structured data, the second plurality of structured data having an expected organization based on the first plurality of structured data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system comprising:
-
one or more processors at a local client device; and a memory storing instructions that, when executed by at least one of the one or more processors, cause the local client device to perform operations comprising; selecting a first plurality of raw data from a second plurality of raw data, the second plurality of raw data being stored remote from the local client device and accessible by a remote device; receiving the first plurality of raw data; selecting, from a library of data wrangling operations, a plurality of data wrangling operations to perform on the first plurality of raw data; instantiating a first data wrangling module operating in a first mode selected from a first plurality of modes, the first mode being selected based on computing resources available to the local client device; applying the plurality of data wrangling operations to the first plurality of raw data using the first mode of the first data wrangling module to obtain a first plurality of structured data; causing presentation, via a user interface, of the first plurality of structured data; receiving, via the user interface, after the causing of the presentation of the first plurality of structured data, an input indicating approval of the first plurality of structured data; and sending, in response to receiving the input, the selection of the plurality of data wrangling operations to the remote device, the remote device being configured to; instantiate a second data wrangling module operating in a second mode selected from a plurality of modes; and apply the selected plurality of data wrangling operations to the second plurality of raw data using the second mode of the second data wrangling module to obtain a second plurality of structured data, the second plurality of structured data having an expected organization based on the first plurality of structured data. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A non-transitory, computer-readable medium storing computer-executable instructions that, when executed by one or more processors of a local client device, cause the local client device to perform operations comprising:
-
selecting a first plurality of raw data from a second plurality of raw data, the second plurality of raw data being stored remote from the local client device and accessible by a remote device; receiving the first plurality of raw data at the local client device; selecting, from a library of data wrangling operations, a plurality of data wrangling operations to perform on the first plurality of raw data; instantiating a first data wrangling module operating in a first mode selected from a first plurality of modes, the first mode being selected based on computing resources available to the local client device; applying the plurality of data wrangling operations to the first plurality of raw data using the first mode of the first data wrangling module to obtain a first plurality of structured data; causing presentation, via a user interface, of the first plurality of structured data; receiving, via the user interface, after the causing of the presentation of the first plurality of structured data, an input indicating approval of the first plurality of structured data; and sending, in response to receiving the input, the selection of the plurality of data wrangling operations to the remote device, the remote device being configured to; instantiate a second data wrangling module operating in a second mode selected from a plurality of modes; and apply the selected plurality of data wrangling operations to the second plurality of raw data using the second mode of the second data wrangling module to obtain a second plurality of structured data, the second plurality of structured data having an expected organization based on the first plurality of structured data. - View Dependent Claims (12, 13, 14)
-
Specification