PROCESSING PRE-EXISTING DATA SETS AT AN ON DEMAND CODE EXECUTION ENVIRONMENT
First Claim
1. A system for processing a plurality of data items within a data source via an on-demand code execution environment, the system comprising:
- a non-transitory data store configured to implement;
an in-process data cache indicating data items, from the plurality of data items, that has been identified by the system but not yet processed at the on-demand code execution environment; and
a results data cache indicating data items, from the plurality of data items, that have been processed at the on-demand code execution environment;
one or more processors configured to implement a user interface subsystem that obtains, from a user computing device, information identifying the data source and a task, on the on-demand code execution environment, to utilize in processing the plurality of data items;
one or more processors configured to implement a data retrieval subsystem that;
retrieves a first set of data items, from the plurality of data items, from the data source; and
for data items of the set of data items;
generates an identifier for the data item;
determines, from the identifier, that the data item is not identified within the in-process data cache or the results data cache; and
enqueues the data item in the in-process data cache;
one or more processors configured to implement a call generation subsystem that;
identifies one or more data items from the in-process data cache;
submits a call to the on-demand code execution environment to execute the task to process the one or more data items;
determines that the task successfully processed the one or more data items; and
places the one or more data items in the results data cache;
wherein the user interface subsystem further transmits a notification to the user computing device when the plurality of data items have been processed at the on-demand code execution environment.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are described for transforming a data set within a data source into a series of task calls to an on-demand code execution environment or other distributed code execution environment. Such environments utilize pre-initialized virtual machine instances to enable execution of user-specified code in a rapid manner, without delays typically caused by initialization of the virtual machine instances, and are often used to process data in near-real time, as it is created. However, limitations in computing resources may inhibit a user from utilizing an on-demand code execution environment to simultaneously process a large, existing data set. The present application provides a task generation system that can iteratively retrieve data items from an existing data set and generate corresponding task calls to the on-demand computing environment, while ensuring that at least one task call for each data item within the existing data set is made.
-
Citations
23 Claims
-
1. A system for processing a plurality of data items within a data source via an on-demand code execution environment, the system comprising:
-
a non-transitory data store configured to implement; an in-process data cache indicating data items, from the plurality of data items, that has been identified by the system but not yet processed at the on-demand code execution environment; and a results data cache indicating data items, from the plurality of data items, that have been processed at the on-demand code execution environment; one or more processors configured to implement a user interface subsystem that obtains, from a user computing device, information identifying the data source and a task, on the on-demand code execution environment, to utilize in processing the plurality of data items; one or more processors configured to implement a data retrieval subsystem that; retrieves a first set of data items, from the plurality of data items, from the data source; and for data items of the set of data items; generates an identifier for the data item; determines, from the identifier, that the data item is not identified within the in-process data cache or the results data cache; and enqueues the data item in the in-process data cache; one or more processors configured to implement a call generation subsystem that; identifies one or more data items from the in-process data cache; submits a call to the on-demand code execution environment to execute the task to process the one or more data items; determines that the task successfully processed the one or more data items; and places the one or more data items in the results data cache; wherein the user interface subsystem further transmits a notification to the user computing device when the plurality of data items have been processed at the on-demand code execution environment. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method comprising:
-
maintaining a data cache indicating data items, from a plurality of data items within a data source, that have been identified for processing at an on-demand code execution environment; iteratively retrieving data items, from the plurality of data items, from the data source; enqueing within the data cache those retrieved data items, from the set of data items, that are not currently identified within the data cache; iteratively submitting calls to the on-demand code execution environment to process data items from the data cache by execution of a task; recording results of the calls submitted with respect to the data items from the data cache; determining that no additional data items, from the plurality of data items, are awaiting processing by execution of the task; and transmitting, to a user computing device associate with the plurality of data items, a notification indicating that processing of the plurality of data items has been completed. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. Non-transitory computer readable media including computer-executable instructions, wherein the computer-executable instructions, when executed by a computing system, cause the computing system to:
-
maintain a data cache indicating data items, from a plurality of data items within a data source, that have been identified for processing at an on-demand code execution environment; retrieve data items, from the plurality of data items, from the data source; enqueue within the data cache those retrieved data items, from the set of data items, that are not currently identified within the data cache; submit calls to the on-demand code execution environment to process data items from the data cache by execution of a task; record results of the calls submitted with respect to the data items from the data cache; determine that no additional data items, from the plurality of data items, are awaiting processing by execution of the task; and transmit, to a user computing device associate with the plurality of data items, a notification indicating that processing of the plurality of data items has been completed. - View Dependent Claims (19, 20, 21, 22, 23)
-
Specification