Distributed computing backup and recovery system
First Claim
1. A method for distributed computing backup and recovery, comprising:
- retrieving at least one user selectable preference;
identifying a first subset of data from within a data set according to the at least one user selectable preference, the first subset of data containing less than all of the data in the data set wherein the first subset is selectable by using the user selectable preference;
receiving, into a memory via an interface controlled by a processor connected to a network in a computing environment wherein the identified data objects are within a second subset of data, the second subset of data containing less than all of the data in the first subset of data;
evaluating, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both for the data objects;
storing at a granular level, at each of the network storage locations, the data objects according to the data object request, when the data object request comprises a request to store the data objects, where the stored data objects are identified as a replica of the data objects stored at each of the network storage locations;
retrieving a hash seed for an identified time used to recreate the hash function for the identified time,where the data object request comprises the identified time to retrieve the data objects;
where the hash seed is a random function based on an initial seed, where the hash seed was previously stored for the identified time;
recreating, using the hash seed, the hash function for the identified time;
retrieving a backup record identifier from a backup log file corresponding to the data object request;
applying the hash function to the backup record identifier, where the hash function identifies the network retrieval locations in the computing environment the data objects are stored;
identifying the data objects within the second subset of data;
retrieving at a granular level from one of the network retrieval locations from a backup of the computing environment, using the processor connected to the network, the stored data objects identified by the one of the network retrieval locations, when the data object request comprises a request to retrieve the data objects, where the stored data objects are retrieved from the second subset of data;
retrieving one of the replica of the data objects from a node where the hash function determines where the stored data objects are located of the network retrieval locations; and
returning the data objects responsive to the request.
1 Assignment
0 Petitions
Accused Products
Abstract
The distributed computing backup and recovery (DCBR) system and method provide backup and recovery for distributed computing models (e.g., NoSQL). The DCBR system extends the protections from server node-level failure and introduces persistence in time so that the evolving data set may be stored and recovered to a past point in time. The DCBR system, instead of performing backup and recovery for an entire dataset, may be configured to apply to a subset of data. Instead of keeping or recovering snapshots of the entire dataset which requires the entire cluster, the DCBR system identifies the particular nodes and/or archive files where the dataset resides so that backup or recovery may be done with a much smaller number of nodes.
-
Citations
17 Claims
-
1. A method for distributed computing backup and recovery, comprising:
-
retrieving at least one user selectable preference; identifying a first subset of data from within a data set according to the at least one user selectable preference, the first subset of data containing less than all of the data in the data set wherein the first subset is selectable by using the user selectable preference; receiving, into a memory via an interface controlled by a processor connected to a network in a computing environment wherein the identified data objects are within a second subset of data, the second subset of data containing less than all of the data in the first subset of data; evaluating, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both for the data objects; storing at a granular level, at each of the network storage locations, the data objects according to the data object request, when the data object request comprises a request to store the data objects, where the stored data objects are identified as a replica of the data objects stored at each of the network storage locations; retrieving a hash seed for an identified time used to recreate the hash function for the identified time, where the data object request comprises the identified time to retrieve the data objects; where the hash seed is a random function based on an initial seed, where the hash seed was previously stored for the identified time; recreating, using the hash seed, the hash function for the identified time; retrieving a backup record identifier from a backup log file corresponding to the data object request; applying the hash function to the backup record identifier, where the hash function identifies the network retrieval locations in the computing environment the data objects are stored; identifying the data objects within the second subset of data; retrieving at a granular level from one of the network retrieval locations from a backup of the computing environment, using the processor connected to the network, the stored data objects identified by the one of the network retrieval locations, when the data object request comprises a request to retrieve the data objects, where the stored data objects are retrieved from the second subset of data; retrieving one of the replica of the data objects from a node where the hash function determines where the stored data objects are located of the network retrieval locations; and returning the data objects responsive to the request. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A product for distributed computing backup and recovery, comprising:
- a computer readable memory with processor executable instructions stored thereon, wherein the instructions when executed by the processor cause the processor to;
retrieve at least one user selectable preference; identify a first subset of data from within a data set according to the at least one user selectable preference, the first subset of data containing less than all of the data in the data set wherein the first subset is selectable by using the user selectable preference; receive, into a memory via an interface controlled by a processor connected to a network in a computing environment, a data object request that identifies data objects to store or retrieve, wherein the identified data objects are within a second subset of data, the second subset of data containing less than all of the data in the first subset of data; evaluate, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both for the data objects; identify the data objects within the second subset of data; store at a granular level, at each of the network storage locations, the data objects according to the data object request, when the data object request comprises a request to store the data objects, where the stored data objects are identified as a replica of the data objects at each of the network storage locations; retrieve a hash seed for an identified time used to recreate the hash function for the identified time, where the hash seed is a random function based on an initial seed, where the hash seed was previously stored for the identified time, where the data object request comprises the identified time to retrieve the data objects; recreate, using the hash seed, the hash function for the identified time; retrieve a backup record identifier from a backup log file corresponding to the data object request; apply the hash function to the backup record identifier, where the hash function identifies the network retrieval locations in the computing environment the data objects are stored; retrieve at a granular level, from one of the network retrieval locations from a backup of the computing environment, using the processor connected to the network, the data objects identified by the one of the network retrieval locations, when the data object request comprises a request to recover or retrieve the data objects, where the stored data objects are retrieved from the second subset of data; retrieve the replicas of the data objects from nodes where the hash function determines where the stored data objects are located of the network retrieval locations; and return the data objects responsive to the request. - View Dependent Claims (7, 8, 9, 10)
- a computer readable memory with processor executable instructions stored thereon, wherein the instructions when executed by the processor cause the processor to;
-
11. A system for distributed computing backup and recovery (DCBR), comprising:
-
a processor to retrieve at least one user selectable preference, identify a first subset of data from within a data set according to the at least one user selectable preference, the first subset of data containing less than all of the data in the data set wherein the first subset is selectable by using the user selectable preference; a cluster of computing nodes in a computing environment; an interface controlled by the processor connected to a network in the computing environment; a memory coupled to the processor, wherein the memory comprises; a data object request received through the interface for data objects wherein the data objects are within a second subset of data, the second subset of data containing less than all of the data in the first subset of data; a hash function that is evaluated by the processor to determine network storage locations or network retrieval locations, or both for the data object; instructions executable by the processor that cause the processor to; retrieve a hash seed for an identified time used to recreate the hash function for the identified time, where the hash seed is a random function based on an initial seed, where the hash seed was previously stored for the identified time, where the data object request comprises the identified time to retrieve the data objects; recreate, using the hash seed, the hash function for the identified time; retrieve a backup record identifier from a backup log file corresponding to the data object request; apply the hash function to the backup record identifier, where the hash function identifies the network retrieval locations in the computing environment the data objects are stored; identify the data objects within the second subset of data; retrieve at a granular level from one of the network retrieval locations the data objects from a backup of the computing environment, when the request is a request to retrieve the data objects, where the stored data objects retrieved are identified by the one of the network retrieval locations, where the stored data objects are retrieved from the second subset of data; store at a granular level the data objects, when the request is a request to store the data object;
where a copy of the data objects are located on one or more of the nodes, where the stored data objects are identified as a replica of the data objects at each of the network storage locations;retrieve the replicas of the data objects from nodes where the hash function determines where the stored data objects are located of the network retrieval locations; and return the data objects responsive to the request. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification