System and method for providing high availability data

US 9,223,841 B2
Filed: 04/26/2010
Issued: 12/29/2015
Est. Priority Date: 03/31/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented data storage system comprising:

host mapping logic configured to map responsibility for storing a plurality of data sets to individual ones of a plurality of hosts which cooperate to implement a data storage system;

a hardware processor;

data set replication logic executed by the hardware processor configured to execute instructions stored in memory, the data set replication logic configured to;

obtain a first version of a data set to be written;

select a first subset of the plurality of hosts to write the first version of the data set;

write a first plurality of copies of the first version of a data set at the first subset of the plurality of hosts, wherein the first plurality of copies respectively include a version history of the first version of the data set;

obtain a second version of the data set to be written, wherein the second version of the data set comprises one or more updates to the data set inconsistent with at least a portion of the first version of the data set;

select a second subset of the plurality of hosts to write the second version of the data set, wherein the first and second subsets of the plurality of hosts include at least one different host;

write a second plurality of copies of the second version of the data set at the second subset of the plurality of hosts, wherein the second plurality of copies respectively include another version history of the second version of the data set;

data set retrieval logic executed by a hardware processor configured to execute instructions stored in memory, the data set retrieval logic configured to be responsive to a request to provide a single copy of the data set by reading a third plurality of copies of the data set at a third subset of the plurality of hosts, wherein the third plurality of copies include at least one copy of the first version of the data set and at least one copy of the second version of the data set, wherein the third subset of the plurality of hosts has at least one host in common with the first subset of the plurality of hosts and at least one host in common with the second subset of the plurality of hosts, and wherein the at least one host in common with the first subset of the plurality of hosts is not a member of the second subset of the plurality of hosts; and

an evaluation component configured to provide a single copy of the data set by;

reading the third plurality of copies of the data set; and

evaluating the version history and the other version history to reconcile the first version of the data set and the second version of the data set read from the third plurality of copies into the single copy of the data set;

wherein the evaluation component is configured to be invoked after the third plurality of copies of the data set is read.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented data processing system and method writes a first plurality of copies of a data set at a first plurality of hosts and reads a second plurality of copies of the data set at a second plurality of hosts. The first and second pluralities of copies may be overlapping and the first and second pluralities of hosts may be overlapping. A hashing function may be used to select the first and second pluralities of hosts. Version histories for each of the first copies of the data set may also be written at the first plurality of hosts and read at the second plurality of hosts. The version histories for the second copies of the data set may be compared and causal between the second copies of the data set may be evaluated based on the version histories for the second copies of the data set.

Citations

41 Claims

1. A computer-implemented data storage system comprising:
- host mapping logic configured to map responsibility for storing a plurality of data sets to individual ones of a plurality of hosts which cooperate to implement a data storage system;
  
  a hardware processor;
  
  data set replication logic executed by the hardware processor configured to execute instructions stored in memory, the data set replication logic configured to;
  
  obtain a first version of a data set to be written;
  
  select a first subset of the plurality of hosts to write the first version of the data set;
  
  write a first plurality of copies of the first version of a data set at the first subset of the plurality of hosts, wherein the first plurality of copies respectively include a version history of the first version of the data set;
  
  obtain a second version of the data set to be written, wherein the second version of the data set comprises one or more updates to the data set inconsistent with at least a portion of the first version of the data set;
  
  select a second subset of the plurality of hosts to write the second version of the data set, wherein the first and second subsets of the plurality of hosts include at least one different host;
  
  write a second plurality of copies of the second version of the data set at the second subset of the plurality of hosts, wherein the second plurality of copies respectively include another version history of the second version of the data set;
  
  data set retrieval logic executed by a hardware processor configured to execute instructions stored in memory, the data set retrieval logic configured to be responsive to a request to provide a single copy of the data set by reading a third plurality of copies of the data set at a third subset of the plurality of hosts, wherein the third plurality of copies include at least one copy of the first version of the data set and at least one copy of the second version of the data set, wherein the third subset of the plurality of hosts has at least one host in common with the first subset of the plurality of hosts and at least one host in common with the second subset of the plurality of hosts, and wherein the at least one host in common with the first subset of the plurality of hosts is not a member of the second subset of the plurality of hosts; and
  
  an evaluation component configured to provide a single copy of the data set by;
  
  reading the third plurality of copies of the data set; and
  
  evaluating the version history and the other version history to reconcile the first version of the data set and the second version of the data set read from the third plurality of copies into the single copy of the data set;
  
  wherein the evaluation component is configured to be invoked after the third plurality of copies of the data set is read.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein the host mapping logic is further configured to generate a hash value based on a hash function.
  - 3. The system of claim 1, wherein the version history and the other version history each comprise a respective vector clock.
  - 4. The system of claim 3, wherein the version history and the other version history each comprise a respective hash history.

5. A computer-implemented data processing method comprising:
- writing a first plurality of copies of a first version of a data set at a first plurality of hosts, including writing a version history for each of the first plurality of copies of the first version of the data set;
  
  writing a second plurality of copies of a second version of the data set at a second plurality of hosts, including writing another version history for each of the second plurality of copies of the second version of the data set, wherein the second version of the data set comprises one or more updates to the data set that is inconsistent with the first version of the data set and wherein the first and second plurality of hosts include at least one different host;
  
  responding to a request to provide a single copy of the data set by reading a third plurality of copies of the data set at a third plurality of hosts, wherein the third plurality of copies include at least one copy of the first version of the data set and at least one copy of the second version of the data set, wherein the third plurality of hosts has at least one host in common with the first plurality of hosts and at least one host in common with the second plurality of hosts, and wherein the at least one host in common with the first plurality of hosts is not a member of the second plurality of hosts;
  
  reconciling the first version of the data set and the second version of the data set into the single copy of the data set according to an evaluation of the version history and the other version history read from the third plurality of copies of the data set; and
  
  providing the single copy of the data set.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 6. The method of claim 5, wherein the version history and the other version history each comprise a respective hash history.
  - 7. The method of claim 5, wherein the version history and the other version history each comprise a respective vector clock.
  - 8. The method of claim 7, wherein the vector clocks each comprise a counter that encodes causality information for a data set including a summary of preceding changes.
  - 9. The method of claim 8, wherein the vector clocks each comprise a host identifier that identifies a host that coordinated a write operation.
  - 10. The method of claim 5, wherein the version history and the other version history each comprise a vector clock, and wherein the method further comprises generating the vector clock written for each of the second plurality of copies of the data set, including copying a prior version of the vector clock associated with a prior version of the data set and incrementing a counter of the vector clock.
  - 11. The method of claim 10, wherein the vector clocks each comprise a plurality of counters, each of the plurality of counters being associated with different hosts that have written prior versions of the data set.
  - 12. The method of claim 11, wherein reconciling the first version of the data set and the second version of the data set into the single copy of the data set according to an evaluation of the version history and the other version history comprises comparing the vector clocks and determining that two of the plurality of copies of the data set are causally related based on one vector clock having less than-or-equal counters for all of the nodes in the other vector clock.
  - 13. The method of claim 11 further comprising truncating the vector clocks.
  - 14. The method of claim 13, wherein truncating the vector clocks includes truncating counters associated with hosts that have not performed a write operation for greater than a predetermined period of time.
  - 15. The method of claim 5, wherein at least one of the writing at a first plurality of hosts and the writing at the second plurality of hosts is performed in accordance with a preference list.
  - 16. The method of claim 15, wherein the preference list is generated based on a hash function that maps the data set to a plurality of hosts based on a data element associated with the data set.
  - 17. The method of claim 16, further comprising generating a hash value based on a hash key and the hash function, the hash key being associated with the data set and being applied as input to the hash function.
  - 18. The method of claim 17, wherein the hash function has a hash range comprising a range of output values for the hash function, the hash value being within the hash range, the data set being one of a plurality of data sets.
  - 19. The method of claim 18, wherein the hash function maps responsibility for storing the plurality of data sets to individual ones of a fourth plurality of hosts which cooperate to implement a data storage system, the first, second, and third pluralities of hosts being subsets of the fourth plurality of hosts.
  - 20. The method of claim 19, wherein at least one of the first plurality of hosts and the second subset of hosts are selected to write the data set based on the hash value and based on whether other hosts are unavailable.
  - 21. The method of claim 20, wherein the hash value is a first hash value and the hash key is a first hash key, wherein the method further comprises generating a second hash value based on a second hash key and the hash function, and wherein the third plurality of hosts is selected to read the data set based on the second hash value and based on whether other hosts are unavailable.
  - 22. The method of claim 5, wherein at least one of the writing at a first plurality of hosts and the writing at the second plurality of hosts is performed in accordance with a preference list, the preference list providing a ranking of hosts at which copies of the data set are to be stored.
  - 23. The method of claim 22, further comprising migrating one of the copies of the data set from a first host to a second host after the second host becomes available, the second host being higher on the preference list than the first host, the second host on the preference list being the host not in common with the first plurality of hosts.
  - 24. The method of claim 23, wherein the preference list ranks hosts in a fourth plurality of hosts which cooperate to implement a data storage system, the first, second, and third pluralities of hosts being subsets of the fourth plurality of hosts.
  - 25. The method of claim 24, further comprising dynamically migrating more recent copies of the data set to hosts that rank higher on the preference list, causing eventual consistency of the data set at a set of hosts at the top of the preference list.
  - 26. The method of claim 5, wherein at least one of the second plurality of copies of the data set and one of the third plurality of copies of the data set are the same copy.
  - 27. The method of claim 5, wherein the method is implemented in a fourth plurality of hosts which cooperate to implement a data storage system, the first, second, and third pluralities of hosts being subsets of the fourth plurality of hosts, and wherein the fourth plurality of hosts cooperate with other hosts to implement a network services system accessible to users by way of a network.
  - 28. The method of claim 27, wherein the network services system provides a website accessible to the users.
  - 29. The method of claim 28, wherein the website is a merchant website.
  - 30. The method of claim 29, wherein the data set comprises shopping cart data for a shopping cart for one of the users.
  - 31. The method of claim 5, wherein reconciling the first version of the data set and the second version of the data set into the single copy of the data set according to the evaluation of the version history and the other version history read comprises determining that the third plurality of copies of the data set comprises conflicting copies.
  - 32. The method of claim 31 wherein a client process performs the reconciling the first version of the data set and the second version of the data set.

33. A computer-implemented data processing method comprising:
- generating a hash value based on a hash key and a hash function, the hash key being associated with a data set and being applied as input to the hash function;
  
  writing a first plurality of copies of a first version of the data set at a first subset of a plurality of hosts, including writing a version history for each of the first plurality of copies of the first version of the data set, wherein the first subset of the plurality of hosts being selected to write the data set based on the hash value;
  
  writing a second plurality of copies of a second version of the data set at a second subset of the plurality of hosts, including writing another version history for each of the second plurality of copies of the second version of the data set, wherein the second version of the data set comprises one or more updates to the data set that is inconsistent with the first version of the data set and wherein the first and second subsets of the plurality of hosts include at least one different host;
  
  responsive to a request to recall a copy of the data set, reading a third plurality of copies of the data set at a third subset of the plurality of hosts, wherein the third plurality of copies include at least one copy of the first version of the data set and at least one copy of the second version of the data set, wherein the third subset of the plurality of hosts has at least one host in common with the first subset of the plurality of hosts and at least one host in common with the second subset of the plurality of hosts, wherein the at least one host in common with the first subset of the plurality of hosts is not a member of the second subset of the plurality of hosts, and wherein the first subset of the plurality of hosts for writing the data set and the third subset of the plurality of hosts for reading the data set are independently determined; and
  
  after reading, reconciling the first version of the data set and the second version of the data set into the single copy of the data set according to an evaluation of the version history and the other version history read from the third plurality of copies of the data set.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41)
- - 34. The method of claim 33, wherein at least one of the writing at the first subset of the plurality of hosts and the writing at the second subset of the plurality of hosts is performed in accordance with a preference list, the preference list providing a ranking of hosts at which copies of the data set are to be stored, and the preference list being generated based on the hash function, and wherein the hash function maps the data set to a plurality of hosts based on a data element associated with the data set.
  - 35. The method of claim 34, further comprising migrating one of the copies of the data set from a first host to a second host after the second host becomes available, the second host being higher on the preference list than the first host, the second host on the preference list being one of the at least one hosts in common with the first subset of the plurality of hosts which is not a member of the second subset of the plurality of hosts.
  - 36. The method of claim 35, wherein the preference list ranks hosts in the plurality of hosts which cooperate to implement a data storage system.
  - 37. The method of claim 36, further comprising dynamically migrating more recent copies of the data set to hosts that rank higher on the preference list, causing eventual consistency of the data set at a set of hosts at the top of the preference list.
  - 38. The method of claim 33, wherein the version history and the other version history each comprise a respective vector clock.
  - 39. The method of claim 38, wherein the vector clocks each comprise a counter that encodes causality information for a data set including a summary of preceding changes and a host identifier that identifies a host where at least one copy of the data set is stored.
  - 40. The method of claim 33, wherein the hash function has a hash range comprising a range of output values for the hash function, the hash value being within the hash range, the data set being one of a plurality of data sets.
  - 41. The method of claim 40, wherein the hash function maps responsibility for storing the plurality of data sets to individual ones of a plurality of hosts which cooperate to implement a data storage system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Vosshall, Peter S., deCandia, Giuseppe, Hastorun, Deniz, Lakshman, Avinash, Pilchin, Alex, Rosero, Ivan D.
Primary Examiner(s)
Pulliam, Christyann
Assistant Examiner(s)
Ohba, Mellissa M

Application Number

US12/767,759
Publication Number

US 20100332451A1
Time in Patent Office

2,073 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 11/2097   maintaining the standby con...

G06F 16/2365   Ensuring data consistency a...

G06F 16/27   Replication, distribution o...

G06F 3/0604   Improving or facilitating a...

G06F 3/065   Replication mechanisms

G06F 3/0673   Single storage device

System and method for providing high availability data

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

41 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for providing high availability data

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

41 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links