Data consistency and rollback for cloud analytics
First Claim
Patent Images
1. A method for retrieving consistent datasets, comprising:
- collecting a first batch of data by a server from one or more tenant applications and associated with a first period of time, wherein the first batch of data includes one or more datasets;
updating the first batch of data in a batch log, wherein the updating occurs during and after the collection of the first batch of data;
storing the first batch of data in memory;
marking the first batch of data as the current batch of data in the batch log;
collecting a second batch of data by the server from the one or more tenant applications and associated with a second period of time subsequent to the first period of time, wherein the second batch of data includes one or more datasets, and wherein one or more the datasets of the second batch of data are distinct from the datasets of the first batch of data;
updating the second batch of data in the batch log, wherein the updating occurs during and after the collection of the second batch of data;
storing the second batch of data in memory;
marking the second batch of data as the current batch of data in the batch log;
detecting a rollback event indicating that a current dataset or a current batch of data should not be used;
marking the first batch of data as the current batch of data after the rollback event;
retrieving the first batch of data from memory using the batch log; and
overwriting the current batch of data corresponding to the second batch of data that should not be used with the retrieved first batch of data from memory.
14 Assignments
0 Petitions
Accused Products
Abstract
An extract-transform-load (ETL) platform fetches consistent datasets in a batch for a given period of time and provides the ability to rollback that batch. The batch may be fetched for an interval of time, and the ETL platform may fetch new or changed data from different cloud/on-premise applications. It will store this data in the cloud or on-premise to build data history. As the ETL platform fetches new data, the system will not overwrite existing data, but rather will create new versions so that change history is preserved. For any reason, if businesses would like to rollback data, they could rollback to any previous batch.
96 Citations
15 Claims
-
1. A method for retrieving consistent datasets, comprising:
-
collecting a first batch of data by a server from one or more tenant applications and associated with a first period of time, wherein the first batch of data includes one or more datasets; updating the first batch of data in a batch log, wherein the updating occurs during and after the collection of the first batch of data; storing the first batch of data in memory; marking the first batch of data as the current batch of data in the batch log; collecting a second batch of data by the server from the one or more tenant applications and associated with a second period of time subsequent to the first period of time, wherein the second batch of data includes one or more datasets, and wherein one or more the datasets of the second batch of data are distinct from the datasets of the first batch of data; updating the second batch of data in the batch log, wherein the updating occurs during and after the collection of the second batch of data; storing the second batch of data in memory; marking the second batch of data as the current batch of data in the batch log; detecting a rollback event indicating that a current dataset or a current batch of data should not be used; marking the first batch of data as the current batch of data after the rollback event; retrieving the first batch of data from memory using the batch log; and overwriting the current batch of data corresponding to the second batch of data that should not be used with the retrieved first batch of data from memory. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer readable non-transitory storage medium having embodied thereon a program, the program being executable by a processor to perform a method for retrieving consistent datasets, the method comprising:
-
collecting a first batch of data by a server from one or more tenant applications and associated with a first period of time, wherein the first batch of data includes one or more datasets; updating the first batch of data in a batch log, wherein the updating occurs during and after the collection of the first batch of data; storing the first batch of data in memory; marking the first batch of data as the current batch of data in the batch log; collecting a second batch of data by the server from the one or more tenant applications and associated with a second period of time subsequent to the first period of time, wherein the second batch of data includes one or more datasets, and wherein one or more the datasets of the second batch of data are distinct from the datasets of the first batch of data; updating the second batch of data in the batch log, wherein the updating occurs during and after the collection of the second batch of data; storing the second batch of data in memory; marking the second batch of data as the current batch of data in the batch log; detecting a rollback event indicating that a current dataset or a current batch of data should not be used; marking the first batch of data as the current batch of data after the rollback event; retrieving the first batch of data from memory using the batch log; and overwriting the current batch of data corresponding to the second batch of data that should not be used with the retrieved first batch of data from memory. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for retrieving consistent datasets, comprising:
-
a memory; a processor; and one or more modules stored in memory and executable by the processor to collect a first batch of data by a server from one or more tenant applications and associated with a first period of time, wherein the first batch of data includes one or more datasets, update the first batch of data in a batch log, wherein the updating occurs during and after the collection of the first batch of data, store the first batch of data in memory, mark the first batch of data as the current batch of data in the batch log, collect a second batch of data by the server from the one or more tenant applications and associated with a second period of time subsequent to the first period of time, wherein the second batch of data includes one or more datasets, and wherein one or more the datasets of the second batch of data are distinct from the datasets of the first batch of data, update the second batch of data in the batch log, wherein the updating occurs during and after the collection of the second batch of data, store the second batch of data in memory, mark the second batch of data as the current batch of data in the batch log, detect a rollback event indicating that a current dataset or a current batch of data should not be used, mark the first batch of data as the current batch of data after the rollback event, retrieve the first batch of data from memory using the batch log, and overwrite the current batch of data corresponding to the second batch of data that should not be used with the retrieved first batch of data from memory.
-
Specification