Continuously available database server having multiple groups of nodes, each group maintaining a database copy with fragments stored on multiple nodes
First Claim
1. A multiprocessor computer system, comprising:
- N data processors, wherein N is a positive integer greater than three, each data processor having it own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors;
each of said N data processors independently executing a distinct instruction data stream;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto;
said N data processors being divided into first and second groups, each having at least two data processors;
each data processor including;
fragmenting means for fragmenting each of said database tables into N fragments, and for storing a primary replica and a standby replica of each fragment, respectively, in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said first and second groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups wold leave a complete copy of each of said database tables in the other of said groups of data processors;
said fragmenting means adapted for allocating each record in any one of said database tables to a particular one of its N fragments in accordance with predefined criteria;
a data dictionary that stores information indicating where each said primary replica and standby replica of each fragment of said database tables is stored among said N data processors;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the primary and standby replicas stored on the failed data processor are not available, for regenerating said primary and standby replicas on the failed data processor, and for storing portions of said regenerated replicas over non-failed ones, if any, of the data processors in the same group of data processors as the failed data processor; and
a transaction manager that responds to database queries by determining which fragment of a database table is being accessed by each database query and then forwarding said each database query to the data processor on which the primary replica of that database table fragment is stored.
6 Assignments
0 Petitions
Accused Products
Abstract
A database server with a "shared nothing" system architecture has multiple nodes, each having its own central processing unit, primary and secondary memory for storing database tables and other data structures, and communication channels for communication with other ones of the nodes. The nodes are divided into first and second groups that share no resources. Each database table in the system is divided into fragments distributed for storage purposes over all the nodes in the system. To ensure continued data availability after a node failure, a "primary replica" and a "standby replica" of each fragment are each stored on nodes in different ones of the first and second groups. Database transactions are performed using the primary fragment replicas, and the standby replicas are updated using transaction log records. Every node of the system includes a data dictionary that stores information indicating where each primary and standby fragment replica is stored. The records of each database table are allocated as evenly as possible among the table fragments. A transaction manager on each node responds to database queries by determining which fragment of a database is being accessed by the query and then forwarding the database query to the node processor on which the primary replica of that fragment is stored. Upon failure of any one of the data processors in the system, each node updates the information in its data dictionary accordingly. In addition, the fragment replicas made unavailable by the node failure are regenerated and stored on the remaining available nodes in the same node group as the failed node.
343 Citations
15 Claims
-
1. A multiprocessor computer system, comprising:
-
N data processors, wherein N is a positive integer greater than three, each data processor having it own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors;
each of said N data processors independently executing a distinct instruction data stream;at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto; said N data processors being divided into first and second groups, each having at least two data processors; each data processor including; fragmenting means for fragmenting each of said database tables into N fragments, and for storing a primary replica and a standby replica of each fragment, respectively, in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said first and second groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups wold leave a complete copy of each of said database tables in the other of said groups of data processors; said fragmenting means adapted for allocating each record in any one of said database tables to a particular one of its N fragments in accordance with predefined criteria; a data dictionary that stores information indicating where each said primary replica and standby replica of each fragment of said database tables is stored among said N data processors; said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the primary and standby replicas stored on the failed data processor are not available, for regenerating said primary and standby replicas on the failed data processor, and for storing portions of said regenerated replicas over non-failed ones, if any, of the data processors in the same group of data processors as the failed data processor; and a transaction manager that responds to database queries by determining which fragment of a database table is being accessed by each database query and then forwarding said each database query to the data processor on which the primary replica of that database table fragment is stored. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A multiprocessor computer system, comprising:
-
N data processors, wherein N is a positive integer greater than three, each data processor having its own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors;
each of said N data processors independently executing a distinct instruction data stream;at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto; said N data processors being divided into first and second groups, each having at least two data processors; each data processor including; fragmenting means for fragmenting each of said database tables into N fragments, and for storing a primary replica and a standby replica of each fragment, respectively, in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said first and second groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors; said fragmenting means adapted for allocating each record in any one of said database tables to a particular one of its N fragments in accordance with predefined criteria; a data dictionary that store information indicating where each said primary replica and standby replica of each fragment of said database tables is stored among said N data processors; said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the primary and standby replicas stored on the failed data processor are not available, for generating new replicas of the primary and standby replicas made unavailable by the failure of said one data processor, for subdividing said new replicas into subfragments, and for distributing the storage of said subfragments in non-failed ones, if any, of the data processors in the same group of data processors as said one failed data processor; and a transaction manager that responds to database queries by determining which fragment of a database table is being accessed by each database query and then forwarding said each database query to the data processor on which the primary replica of that database table fragment is stored.
-
-
9. A method of distributing data storage and transactional workloads in multiprocessor computer system having:
-
N data processors, wherein N is a positive integer greater than three, each data processor having its own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors; at least a plurality of said N data processors including a communications processor that receives transaction requests and transmits responses thereto; said N data processors being divided into first and second groups, each having at least two data processors; the steps of the method comprising; independently executing a distinct instruction data stream on each of said N data processors; fragmenting each of said database tables into N fragments, and storing a primary replica and a standby replica of each fragment, respectively, in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said first and second groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors; said fragmenting step including allocating each record in any one of said database tables to a particular one of its N fragments in accordance with predefined criteria; storing in a data dictionary in each of said N data processors information indicating where each said primary replica and standby replica of each fragment of said database tables is stored among said N data processors; upon failure of any one of said N data processors, changing the information stored in said data dictionary to indicate that the primary and standby replicas stored on the failed data processor are not available; and responding to database queries by determining which fragment of a database table is being accessed by each database query, accessing the information in said data dictionary to determine which one of said N data processors contains the primary replica of that database table fragment, and then forwarding each said database query to said one of said N data processors. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A method of distributing data storage and transactional workloads in multiprocessor computer system having:
-
N data processors, wherein N is a positive integer greater than three, each data processor having its own, separate, central processing unit, memory for storing database tables and other data structures, and communication channels for communication with other ones of said N data processors; at least a plurality of said N data processors including a communications processor that receives transaction requests and transmits responses thereto; said N data processors being divided into first and second groups, each having at least two data processors; the steps of the method comprising; independently executing a distinct instruction data stream on each of said N data processors; fragmenting each of said database tables into N fragments, and storing a primary replica and a standby replica of each fragment, respectively, in different ones of said N data processors, wherein said different ones of said N data processors are in different ones of said first and second groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors; said fragmenting step including allocating each record in any one of said database tables to a particular one of its N fragments in accordance with predefined criteria; storing in a data dictionary in each of said N data processors information indicating where each said primary replica and standby replica of each fragment of said database tables is stored among said N data processors; upon failure of any one of said N data processors, changing the information stored in said data dictionary to indicate that the primary and standby replicas stored on the failed data processor are not available, generating new replicas of the primary and standby replicas made unavailable by failure of said one data processor, subdividing said new replicas into subfragments and distributing storage of said subfragments in non-failed ones, if any, of said data processors in the same group of data processors as said one failed data processor; and responding to database queries by determining which fragment of a database table is being accessed by each database query, accessing the information in said data dictionary to determine which one of said N data processors contains the primary replica of that database table fragment, and then forwarding said each database query to said one of said N data processors.
-
Specification