Distributed computing backup and recovery system

US 10,102,264 B2
Filed: 11/25/2014
Issued: 10/16/2018
Est. Priority Date: 09/30/2011
Status: Active Grant

First Claim

Patent Images

1. A method for distributed computing backup and recovery, comprising:

retrieving a user selectable preference;

identifying a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference;

intercepting the first subset of data at an application programming interface (API);

encrypting, by the API, at least a portion of the first subset of data into encrypted data objects that comprise a second subset of data that is less than all of the data in the first subset of data;

receiving, into a memory via an interface controlled by a processor connected to a network in a computing environment, the second subset of data;

evaluating, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both, for the encrypted data objects;

storing, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the plurality of different storage nodes are identified as replicas of data in the data set;

retrieving, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object;

determining a hash seed used to recreate the hash function for an identified time and storing the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations,wherein the hash seed for the identified time is used to recreate the hash function for the identified time;

wherein the hash seed is a random function based on an initial seed; and

when a data object request comprises a request to store the encrypted data objects, recording write accesses that occur in the computing environment until the encrypted data objects are stored; and

when the data object request comprises a request to retrieve the encrypted data objects, playing back the recording of write accesses until a restore completes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The distributed computing backup and recovery (DCBR) system and method provide backup and recovery for distributed computing models (e.g., NoSQL). The DCBR system extends the protections from server node-level failure and introduces persistence in time so that the evolving data set may be stored and recovered to a past point in time. The DCBR system, instead of performing backup and recovery for an entire dataset, may be configured to apply to a subset of data. Instead of keeping or recovering snapshots of the entire dataset which requires the entire cluster, the DCBR system identifies the particular nodes and/or archive files where the dataset resides so that backup or recovery may be done with a much smaller number of nodes.

Citations

11 Claims

1. A method for distributed computing backup and recovery, comprising:
- retrieving a user selectable preference;
  
  identifying a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference;
  
  intercepting the first subset of data at an application programming interface (API);
  
  encrypting, by the API, at least a portion of the first subset of data into encrypted data objects that comprise a second subset of data that is less than all of the data in the first subset of data;
  
  receiving, into a memory via an interface controlled by a processor connected to a network in a computing environment, the second subset of data;
  
  evaluating, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both, for the encrypted data objects;
  
  storing, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the plurality of different storage nodes are identified as replicas of data in the data set;
  
  retrieving, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object;
  
  determining a hash seed used to recreate the hash function for an identified time and storing the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations,wherein the hash seed for the identified time is used to recreate the hash function for the identified time;
  
  wherein the hash seed is a random function based on an initial seed; and
  
  when a data object request comprises a request to store the encrypted data objects, recording write accesses that occur in the computing environment until the encrypted data objects are stored; and
  
  when the data object request comprises a request to retrieve the encrypted data objects, playing back the recording of write accesses until a restore completes.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein retrieving further comprises:
    - determining a plurality of configurable restore options, the restore options including;
      
      a sequence of a plurality of nodes to restore including one or more nodes from which to granularly retrieve a copy of the encrypted data objects;
      
      orrestore nodes to use to restore the encrypted data objects concurrently;
      
      ora combination thereof.
  - 3. The method of claim 1, wherein the hash function uses a hash ring to map a first namespace into an evenly distributed second namespace using a hashing function wherein the evenly distributed second namespace is smaller than the first namespace.
  - 4. The method of claim 3, wherein the second namespace is used to distribute the first namespace across nodes in the computing environment.
  - 5. The method of claim 1, wherein, after intercepting the first subset of data at the API, the method further comprises:
    - controlling distribution of the first subset of data through the API by interfacing with a database.
  - 6. The method of claim 1, wherein encrypting the first subset of data by using the API further comprises:
    - backing up a portion of the encrypted first subset of data.

7. A computer program product for distributed computing backup and recovery, comprising:
- a non-transitory computer readable memory with processor executable instructions stored thereon, wherein the instructions when executed by the processor cause the processor to;
  
  retrieve a user selectable preference;
  
  identify a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference;
  
  intercept the first subset of data at an application programming interface (API);
  
  encrypt, by the API, at least a portion of the first subset of data into encrypted data objects that comprise a second subset of data that is less than all of the data in the first subset of data;
  
  receive, into a memory via an interface controlled by a processor connected to a network in a computing environment;
  
  evaluate, using the processor, a hash function stored in the memory to determine network storage locations or network retrieval locations, or both, for the encrypted data objects;
  
  store, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the different storage nodes are identified as replicas of data in the data set;
  
  retrieve, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object;
  
  determine a hash seed used to recreate the hash function for an identified time and store the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations;
  
  wherein the hash seed for the identified time is used to recreate the hash function for the identified time, wherein the hash seed is a random function based on an initial seed;
  
  record write accesses that occur in the computing environment until the encrypted data objects are stored, when a data object request is a request to store the encrypted data objects; and
  
  play back the recording of write accesses until a restore completes, when the data object request is a request to retrieve the encrypted data objects.
- View Dependent Claims (8)
- - 8. The computer program product of claim 7, the instructions when executed by the processor further cause the processor to determine a plurality of configurable restore options, the restore options including:
    - a sequence of a plurality of nodes to restore including one or more nodes from which to granularly retrieve a copy of the encrypted data objects;
      
      orrestore nodes to use to restore the encrypted data objects concurrently;
      
      ora combination thereof.

9. A system for distributed computing backup and recovery (DCBR), comprising:
- a processor configured to retrieve a user selectable preference, identify a first subset of data from within a data set according to at least one user selectable preference, the first subset of data comprising less than all of the data in the data set, and wherein the first subset is selectable by the user selectable preference;
  
  an application programming interface (API) configured to intercept the first subset of data, and encrypt the first subset of data;
  
  a cluster of computing nodes in a computing environment;
  
  an interface controlled by the processor connected to a network in the computing environment;
  
  a memory coupled to the processor, wherein the memory comprises;
  
  a data object request received through the interface for encrypted data objects wherein the encrypted data objects are encrypted and comprise a second subset of data that is less than all of the data in the first subset of data;
  
  a hash function evaluated by the processor to determine network storage locations or network retrieval locations, or both, for the encrypted data objects;
  
  instructions executable by the processor that cause the processor to;
  
  store, at a granular level that is less than all the data in the data set, multiple replica sets of the encrypted data objects across a plurality of different storage nodes included in the network storage locations according to the hash function, wherein each replica set of the encrypted data objects is stored across a respective cluster group of storage nodes from within the plurality of different storage nodes, and wherein the encrypted data objects stored across the different storage nodes are identified as replicas of data in the data set; and
  
  retrieve, from the multiple replica sets of the encrypted data sets stored across their respective cluster groups of storage nodes, a preferred replica set of the encrypted data objects stored on a preferred cluster group of storage nodes in the network retrieval locations according to the hash function and an additional selection criteria including data freshness of the preferred replica set of the encrypted data objects, wherein the storage nodes comprising the preferred cluster group share the same hash function, and wherein data freshness includes a storage time of a respective encrypted data object;
  
  determine a hash seed used to recreate the hash function for an identified time and store the hash seed for an identified time, at the plurality of different storage nodes included in the network storage locations;
  
  wherein the hash seed for the identified time is used to recreate the hash function for the identified time, wherein the hash seed is a random function based on an initial seed;
  
  record write accesses that occur in the computing environment until the encrypted data objects are stored, when the data object request is a request to store the encrypted data objects; and
  
  play back the recording of write accesses until a restore completes, when the data object request is a request to retrieve the encrypted data objects.
- View Dependent Claims (10, 11)
- - 10. The system of claim 9, wherein the memory further comprises:
    - a backup log file that includes a backup record identifier corresponding to the preferred replica set of the encrypted data objects.
  - 11. The system of claim 9, wherein the instructions further cause the processor to execute restore options, the restore options including:
    - a sequence of a plurality of nodes to restore including the one or more nodes from which to granularly retrieve a copy of the encrypted data objects;
      
      orrestore nodes to use to restore the encrypted data objects concurrently;
      
      ora combination thereof.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Accenture Global Services Limited (Accenture PLC)
Original Assignee
Accenture Global Services Limited (Accenture PLC)
Inventors
Tung, Teresa, Farooqui, Sameer, Richter, Owen
Primary Examiner(s)
Le, Miranda

Application Number

US14/553,266
Publication Number

US 20150127982A1
Time in Patent Office

1,421 Days
Field of Search

707654
US Class Current
CPC Class Codes

G06F 11/1451   by selection of backup cont...

G06F 11/1464   for networked environments

G06F 11/1469   Backup restoration techniques

G06F 16/27   Replication, distribution o...

Distributed computing backup and recovery system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed computing backup and recovery system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links