Method and apparatus for correct and complete transactions in a fault tolerant distributed database system
First Claim
1. In a system comprised of at least one master processor and a plurality of slave processors wherein each processor accesses and controls a database associated with each processor, a method for compensating for a failure condition involving either said at least one master processor or any one of said plurality of slave processors, said method comprising the steps of:
- a) recording the last completed database update transaction at said each processor;
b) creating a journal operatively associated with said at least one master database processor for recording steps of a predetermined number of database update transactions generated by said at least one master database processor;
c) recording said steps of database update transactions in said journal;
d) creating a plurality of first timers, each of said plurality of slave processors having associated with it, at least one of said plurality of first timers;
e) creating a plurality of second timers, each of said plurality of slave processors having associated with it, at least one of said plurality of second timers;
f) starting said at least one of said plurality of first timers at the beginning of an update transaction;
g) sending a predetermined message from a first one of said plurality of slave processors to said at least one master processor at the end of an update transaction;
h) starting said second timer associated with said first one of said plurality of slave processors when said predetermined message is sent to said at least one master processor;
i) in the event of failure of any one of said plurality of slave processors;
A) copying to said failed processor from said journal all database update transactions subsequent to the last database update of the previously failed processor;
j) in the event of failure to said at least one master processor;
A) detecting the expiration of said at least one of said plurality of first timers prior to the expiration of said at least one of a plurality of second timers and thereafter;
1) aborting said current update transaction;
2) sending an abort message to said plurality of slave database processors by said slave processor associated with the expired at least one of said plurality of first timers;
B) detecting the expiration of said at least one of said plurality of second timers and thereafter committing said current update transaction.
7 Assignments
0 Petitions
Accused Products
Abstract
In a distributed network of processors, a method for completing update transactions using update transaction timers after failure of one processor. Failed slave processors are updated with other slave processors using a record of the last completed database update transaction at each processor prior to failure and using a journal in the master processor that records steps of database update transactions generated by the master database processor.
156 Citations
32 Claims
-
1. In a system comprised of at least one master processor and a plurality of slave processors wherein each processor accesses and controls a database associated with each processor, a method for compensating for a failure condition involving either said at least one master processor or any one of said plurality of slave processors, said method comprising the steps of:
-
a) recording the last completed database update transaction at said each processor;
b) creating a journal operatively associated with said at least one master database processor for recording steps of a predetermined number of database update transactions generated by said at least one master database processor;
c) recording said steps of database update transactions in said journal;
d) creating a plurality of first timers, each of said plurality of slave processors having associated with it, at least one of said plurality of first timers;
e) creating a plurality of second timers, each of said plurality of slave processors having associated with it, at least one of said plurality of second timers;
f) starting said at least one of said plurality of first timers at the beginning of an update transaction;
g) sending a predetermined message from a first one of said plurality of slave processors to said at least one master processor at the end of an update transaction;
h) starting said second timer associated with said first one of said plurality of slave processors when said predetermined message is sent to said at least one master processor;
i) in the event of failure of any one of said plurality of slave processors;
A) copying to said failed processor from said journal all database update transactions subsequent to the last database update of the previously failed processor;
j) in the event of failure to said at least one master processor;
A) detecting the expiration of said at least one of said plurality of first timers prior to the expiration of said at least one of a plurality of second timers and thereafter;
1) aborting said current update transaction;
2) sending an abort message to said plurality of slave database processors by said slave processor associated with the expired at least one of said plurality of first timers;
B) detecting the expiration of said at least one of said plurality of second timers and thereafter committing said current update transaction. - View Dependent Claims (2, 3, 4, 5)
-
-
6. In a distributed network of processors, comprised of at least one master processor and a plurality of slave processors wherein each processor accesses and controls a database associated with each processor, a method for updating a database of a failed slave processor to match other databases after said failed slave processor recovers comprising the steps of:
-
a) recording a record of the last completed database update transaction at each processor located in the distributed network during a first timer window established by a first timer from a plurality of first timers, wherein each first timer from the plurality of first timers is associated with each processor;
b) creating a journal operatively associated with said master database processor located in the distributed network that records the steps of database update transactions generated by the master database processor during a second timer window established by a second timer associated with each processor, wherein the second timer is from a plurality of second timers;
c) recording said steps of database update transactions in said journal;
d) upon the recovery of the failed processor located in the distributed network, copying to said failed processor from said journal all database update transactions subsequent to the last database update of the previously failed processor; and
e) updating the database associated with said failed processor with all database update transactions subsequent to the last database update. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
e) designating a processor as a back-up master processor;
f) creating a journal operatively associated with said back-up master processor;
g) recording into the journal operatively associated with said back-up master processor, the information recorded into the journal operatively associated with said master database processor.
-
-
15. In a distributed network of processors, comprised of at least one master processor and a plurality of slave processors wherein each processor accesses and controls a database associated with each processor, a method for completing update transactions after failure of said at least one master processor comprising the steps of:
-
creating a plurality of timers, operatively associated with said plurality of slave processors such that each of said plurality of slave processors is associated with at least one of said plurality of timers;
receiving a current update transaction at a first one of said plurality of slave processors;
sending a vote to commit message from said first one of said plurality of slave processors to said at least one master processor upon receiving said current update transaction;
starting said at least one of a plurality of timers associated with said any one of said plurality of slave processors sending said message; and
responding to the expiration of said at least one of a plurality of timers. - View Dependent Claims (16, 17, 18)
receiving an abort message from said any one of said plurality of slave processors before the expiration of said at least one of a plurality of timers; and
aborting said current update transaction by said any one of said plurality of slave processors receiving said abort message.
-
-
18. The method of claim 15, further comprising the step of sending an abort message to said plurality of slave database processors by said any one of said plurality of slave processors receiving said abort message.
-
19. In a distributed network of processors, comprised of at least one master processor and a plurality of slave processors wherein each processor accesses and controls a database associated with each processor, a method for completing update transactions after failure of said at least one master processor comprising the steps of:
-
a) creating a plurality of first timers operatively associated with said plurality of slave processors such that each of said plurality of slave processors is associated with at least one of said plurality of first timers;
b) creating a plurality of second timers operatively associated with said plurality of slave processors such that each of said plurality of slave processors is associated with at least one of said plurality of second timers;
c) starting said at least one of said plurality of first timers in response to the associated slave processor receiving a begin step of a current update transaction;
d) sending a message from any one of said plurality of slave processors to said at least one master processor in response to the associated slave processor receiving the end step of a current update transaction;
e) starting said at least one of a plurality of second timers associated with said any one of said plurality of slave processors sending said message in response to sending said message; and
f) detecting the expiration of said at least one of said plurality of first timers or said at least one of a plurality of second timers. - View Dependent Claims (20, 21, 22, 23)
g) detecting the expiration of said at least one of said plurality of first timers prior to the expiration of said at least one of a plurality of second timers; and
h) aborting said current update transaction.
-
-
21. The method of claim 20, further comprising the step of:
i) sending an abort message to said plurality of slave database processors by said slave processor associated with the expired at least one of said plurality of first timers.
-
22. The method of claim 21, further comprising the steps of:
-
j) receiving the abort message by one of said plurality of slave processors;
k) aborting the current update transaction by said one of said plurality of slave processors; and
i) sending an abort message to said plurality of slave processors by said one of a plurality of slave processors.
-
-
23. The method of claim 22, further comprising the steps of:
-
f) detecting the expiration of said at least one of said plurality of second timers;
g) committing said current update transaction.
-
-
24. In a distributed network of processors, comprised of at least one master processor and a plurality of slave processors wherein each processor accesses and controls a database associated with each processor, a method for completing update transactions after failure of said at least one master processor comprising the steps of:
-
a) creating a plurality of first timers operatively associated with said plurality of slave processors such that each of said plurality of slave processors is associated with at least one of said plurality of first timers;
b) creating a plurality of second timers operatively associated with said plurality of slave processors such that each of said plurality of slave processors is associated with at least one of said plurality of second timers;
c) starting said at least one of said plurality of first timers in response to the associated slave processor receiving a begin step of a current update transaction;
d) resetting said at least one of said plurality of timers in response to receiving a portion of said update transaction by said associated slave processor;
e) disabling said at least one of said plurality of timers in response to receiving a message from said at least one master processor;
f) sending a message from any one of said plurality of slave processors to said at least one master processor in response to the associated slave processor receiving the end step of a current update transaction;
g) starting said at least one of a plurality of second timers associated with said any one of said plurality of slave processors sending said message in response to sending said message;
h) detecting the expiration of said at least one of said plurality of first timers prior to the expiration of said at least one of a plurality of second timers and thereafter;
i) aborting said current update transaction;
ii) sending an abort message to said plurality of slave database processors by said slave processor associated with the expired at least one of said plurality of first timers;
i) detecting the expiration of said at least one of said plurality of second timers and thereafter committing said current update transaction.
-
-
25. A distributed network of processors each processor having associated therewith a database comprising:
-
a master processor for updating databases associated with said each processor;
a plurality of slave processors recording database updates sent by said master processor;
a plurality of first timers, each of said plurality of slave processors having associated with it, at least one said plurality of first timers, each first timer having associated therewith a first timer window;
a plurality of second timers, each of said plurality of slave processors having associated with it, at least one of said plurality of second timers, each second timer having associated therewith a second timer window;
a journal, associated with said master processor, for recording steps of database update transactions successfully completed by the master database processor wherein said journal is used for updating or re-synchronizing a database of said failed slave processor after the failed processor recovers or returns to service; and
a high speed data interconnect for interconnecting said master processor and said plurality of slave processors. - View Dependent Claims (26, 27, 28, 29, 30)
random access memory. -
29. The apparatus of claim 25, wherein at least one of said plurality of slave processors further comprise at least one electronic storage media device storing an indicator of the last completed update transaction.
-
30. The apparatus of claim 25, wherein said high speed data interconnect further comprises Ethernet.
-
-
31. A distributed network of processors each processor having associated therewith a database comprising:
-
a master processor for updating databases associated with said each processor via a two-phase commit;
a plurality of slave processors recording database updates sent by said master processor;
a plurality of first timers, each of said plurality of slave processors having associated with it, at least one said plurality of first timers, each first timer having associated therewith a first timer window;
a plurality of second timers, each of said plurality of slave processors having associated with it, at least one of said plurality of second timers, each second timer having associated therewith a second timer window;
a journal operatively associated with said at least one master database processor for recording steps of database update transactions generated by said at least one master database processor; and
a memory means for storing an indicia of the last completed database update transaction performed by said slave processor. - View Dependent Claims (32)
-
Specification