Continuously available database server having multiple groups of nodes with minimum intersecting sets of database fragment replicas
First Claim
1. A multiprocessor computer system, comprising:
- N data processors, wherein N is a positive integer greater than three, each data processor having its own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors;
each of said N data processors independently executing a distinct instruction data stream;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto;
said N data processors being divided into at least two groups, each having at least two data processors;
each data processor including;
fragmenting means for fragmenting each of said database tables into N fragments, and for storing replicas of each fragment in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available, and for regenerating said replicas on the failed data processor in non-failed ones, if any, of the data processors in the same group of data processors as the failed data processor; and
said fragmenting means further adapted for dividing said database tables into F fragments FSx, for storing said F fragments in the data processors in each said group, where for a particular fragment FSx, S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1;
said fragmenting means adapted to assign each fragment FSx to a data processor y in group S in accordance with the following fragment to node assignment equation;
space="preserve" listing-type="equation">y=(x+(xdiv N.sub.S)·
Q(S) ) modulo availwhere y identifies which data processor fragment FSx is assigned to, NS is the number of data processors in group S used for storing database fragments, Q(S) is an integer between 0 and NS -1 where Q(S) is a distinct value for each said group, and avail is the number of said NS data processors that have not failed.
6 Assignments
0 Petitions
Accused Products
Abstract
A database server with a "shared nothing" system architecture has multiple nodes, each having its own central processing unit, primary and secondary memory for storing database tables and other data structures, and communication channels for communication with other ones of the nodes. The nodes are divided into at least two groups that share no resources. Each database table in the system is divided into fragments distributed for storage purposes over all the nodes in the system. To ensure continued data availability after a node failure, a "primary replica" and a "standby replica" of each fragment are each stored on nodes in different ones of the groups. Database transactions are performed using the primary fragment replicas, and the standby replicas are updated using transaction log records. Every node of the system includes a data dictionary that stores information indicating where each primary and standby fragment replica is stored. The fragments are allocated as evenly as possible among the system nodes using a fragment to node assignment equation. A transaction manager on each node responds to database queries by determining which fragment of a database is being accessed by the query and then forwarding the database query to the node processor on which the primary replica of that fragment is stored. Upon failure of any one of the data processors in the system, each node updates the information in its data dictionary accordingly. In addition, the fragment replicas made unavailable by the node failure are regenerated and stored on the remaining available nodes in the same node group as the failed node.
-
Citations
7 Claims
-
1. A multiprocessor computer system, comprising:
-
N data processors, wherein N is a positive integer greater than three, each data processor having its own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors;
each of said N data processors independently executing a distinct instruction data stream;at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto; said N data processors being divided into at least two groups, each having at least two data processors; each data processor including; fragmenting means for fragmenting each of said database tables into N fragments, and for storing replicas of each fragment in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors; a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors; said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available, and for regenerating said replicas on the failed data processor in non-failed ones, if any, of the data processors in the same group of data processors as the failed data processor; and said fragmenting means further adapted for dividing said database tables into F fragments FSx, for storing said F fragments in the data processors in each said group, where for a particular fragment FSx, S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1;
said fragmenting means adapted to assign each fragment FSx to a data processor y in group S in accordance with the following fragment to node assignment equation;
space="preserve" listing-type="equation">y=(x+(xdiv N.sub.S)·
Q(S) ) modulo availwhere y identifies which data processor fragment FSx is assigned to, NS is the number of data processors in group S used for storing database fragments, Q(S) is an integer between 0 and NS -1 where Q(S) is a distinct value for each said group, and avail is the number of said NS data processors that have not failed. - View Dependent Claims (2, 3, 4)
-
-
5. A method of distributing data storage and transactional workloads in multiprocessor computer system having:
-
N data processors, wherein N is a positive integer greater than three, each data processor having its own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors; at least a plurality of said N data processors including a communications processor that receives transaction requests and transmits responses thereto; said N data processors being divided into at least two groups, each having at least two data processors; the steps of the method comprising; independently executing a distinct instruction data stream on each of said N data processors; fragmenting each of said database tables into N fragments, and storing replicas of each fragment in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors; said fragmenting step including allocating each record in any one of said database tables to a particular one of its N fragments in accordance with predefined criteria; storing in a data dictionary in each of said N data processors information indicating where each said replica of each fragment of said database tables is stored among said N data processors; and upon failure of any one of said N data processors, changing the information stored in said data dictionary to indicate that the replicas stored on the failed data processor are not available; said fragmenting step including dividing said database tables into F fragments FSx, storing said F fragments in the data processors in each said group, where for a particular fragment FSx, S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1;
said fragmenting step assigning each fragment FSx to a data processor y in group S in accordance with the following fragment to node assignment equation;
space="preserve" listing-type="equation">y=(x+(x div N.sub.S)·
Q(S) ) modulo availwhere y identifies which data processor fragment FSx is assigned to, NS is the number of data processors in group S used for storing database fragments, Q(S) is an integer between 0 and NS -1 where Q(S) is a distinct value for each said group, and avail is the number of said NS data processors that have not failed. - View Dependent Claims (6, 7)
-
Specification