Distributed computing backup and recovery system
First Claim
1. A method for distributed computing backup and recovery, comprising:
- retrieving a user selectable preference;
identifying a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference;
intercepting the first subset of data at an application programming interface (API);
encrypting, by the API, at least a portion of the first subset of data into encrypted data objects that comprise a second subset of data that is less than all of the data in the first subset of data;
receiving, into a memory via an interface controlled by a processor connected to a network in a computing environment, the second subset of data;
evaluating, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both, for the encrypted data objects;
storing, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the plurality of different storage nodes are identified as replicas of data in the data set;
retrieving, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object;
determining a hash seed used to recreate the hash function for an identified time and storing the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations,wherein the hash seed for the identified time is used to recreate the hash function for the identified time;
wherein the hash seed is a random function based on an initial seed; and
when a data object request comprises a request to store the encrypted data objects, recording write accesses that occur in the computing environment until the encrypted data objects are stored; and
when the data object request comprises a request to retrieve the encrypted data objects, playing back the recording of write accesses until a restore completes.
1 Assignment
0 Petitions
Accused Products
Abstract
The distributed computing backup and recovery (DCBR) system and method provide backup and recovery for distributed computing models (e.g., NoSQL). The DCBR system extends the protections from server node-level failure and introduces persistence in time so that the evolving data set may be stored and recovered to a past point in time. The DCBR system, instead of performing backup and recovery for an entire dataset, may be configured to apply to a subset of data. Instead of keeping or recovering snapshots of the entire dataset which requires the entire cluster, the DCBR system identifies the particular nodes and/or archive files where the dataset resides so that backup or recovery may be done with a much smaller number of nodes.
-
Citations
11 Claims
-
1. A method for distributed computing backup and recovery, comprising:
-
retrieving a user selectable preference; identifying a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference; intercepting the first subset of data at an application programming interface (API); encrypting, by the API, at least a portion of the first subset of data into encrypted data objects that comprise a second subset of data that is less than all of the data in the first subset of data; receiving, into a memory via an interface controlled by a processor connected to a network in a computing environment, the second subset of data; evaluating, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both, for the encrypted data objects; storing, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the plurality of different storage nodes are identified as replicas of data in the data set; retrieving, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object; determining a hash seed used to recreate the hash function for an identified time and storing the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations, wherein the hash seed for the identified time is used to recreate the hash function for the identified time; wherein the hash seed is a random function based on an initial seed; and when a data object request comprises a request to store the encrypted data objects, recording write accesses that occur in the computing environment until the encrypted data objects are stored; and when the data object request comprises a request to retrieve the encrypted data objects, playing back the recording of write accesses until a restore completes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product for distributed computing backup and recovery, comprising:
a non-transitory computer readable memory with processor executable instructions stored thereon, wherein the instructions when executed by the processor cause the processor to; retrieve a user selectable preference; identify a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference; intercept the first subset of data at an application programming interface (API); encrypt, by the API, at least a portion of the first subset of data into encrypted data objects that comprise a second subset of data that is less than all of the data in the first subset of data; receive, into a memory via an interface controlled by a processor connected to a network in a computing environment; evaluate, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both, for the encrypted data objects; store, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the different storage nodes are identified as replicas of data in the data set; retrieve, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object; determine a hash seed used to recreate the hash function for an identified time and store the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations; wherein the hash seed for the identified time is used to recreate the hash function for the identified time, wherein the hash seed is a random function based on an initial seed; record write accesses that occur in the computing environment until the encrypted data objects are stored, when a data object request is a request to store the encrypted data objects; and play back the recording of write accesses until a restore completes, when the data object request is a request to retrieve the encrypted data objects. - View Dependent Claims (8)
-
9. A system for distributed computing backup and recovery (DCBR), comprising:
-
a processor configured to retrieve a user selectable preference, identify a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference; an application programming interface (API) configured to intercept the first subset of data, and encrypt the first subset of data; a cluster of computing nodes in a computing environment; an interface controlled by the processor connected to a network in the computing environment; a memory coupled to the processor, wherein the memory comprises; a data object request received through the interface for encrypted data objects wherein the encrypted data objects are encrypted and comprise a second subset of data that is less than all of the data in the first subset of data; a hash function evaluated by the processor to determine network storage locations or network retrieval locations, or both, for the encrypted data objects; instructions executable by the processor that cause the processor to; store, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the different storage nodes are identified as replicas of data in the data set; and retrieve, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object; determine a hash seed used to recreate the hash function for an identified time and store the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations; wherein the hash seed for the identified time is used to recreate the hash function for the identified time, wherein the hash seed is a random function based on an initial seed; record write accesses that occur in the computing environment until the encrypted data objects are stored, when the data object request is a request to store the encrypted data objects; and play back the recording of write accesses until a restore completes, when the data object request is a request to retrieve the encrypted data objects. - View Dependent Claims (10, 11)
-
Specification