Method and apparatus for performing change-over control to processor groups by using rate of failed processors in a parallel computer
First Claim
1. A parallel processor system, including a current processor group having a plurality of processors and a network connecting the processors to each other and a standby processor group including a plurality of processors and a network connecting the processors to each other, said parallel processor system comprising:
- a monitor processor disposed in said current processor group and a monitor processor disposed in said standby processor group, said monitor processors controlling processing information of all processors in said current and standby processor groups, respectively;
a processor control section for executing a change-over operation from said current processor group to said standby processor group;
an address control table for storing therein physical and logical addresses of each processor of each of said current and standby processor groups with correspondence established therebetween;
a scheduling table for setting therein a faulty processor ratio determined according to an amount of job processing in said parallel processor system;
a storage section for storing therein processing information of each processor reported at a predetermined point of time from a monitor table disposed in said current processor group;
a change-over control section for calculating a ratio of failed processors in said current processor group at a pertinent point of time and comparing the calculated ratio with the faulty processor ratio set in the scheduling table when a report notifying occurrence of a failure in a processor is received from said monitor processor disposed in said current processor group, and executing a change-over operation of transferring the job processing from said current processor group to said standby processor group when the calculated ratio is equal to or more than the faulty processor ratio; and
an operator'"'"'s console connected to said processor control section for arbitrarily setting therefrom the faulty processor ratio to the scheduling table.
1 Assignment
0 Petitions
Accused Products
Abstract
In a change-over control method for a parallel processor system including a current processor group having a plurality of processors and a network connecting the processors to each other and a standby processor group configured in the same way as for the current processor group, a processor control section is disposed in the parallel processor system, and a monitor processor is arranged for each of the current and standby processor groups. A faulty processor ratio determined according to the amount of job processing is set to the processor control section. On receiving a report notifying occurrence of a failure in a processor from the monitor processor disposed in the current processor group, the processor control section determines a ratio of failed processors in the current processor group. When the ratio is equal to or more than the faulty processor ratio, the processor control section effects a change-over operation of transferring job processing from the current processor group to the standby processor group.
-
Citations
15 Claims
-
1. A parallel processor system, including a current processor group having a plurality of processors and a network connecting the processors to each other and a standby processor group including a plurality of processors and a network connecting the processors to each other, said parallel processor system comprising:
-
a monitor processor disposed in said current processor group and a monitor processor disposed in said standby processor group, said monitor processors controlling processing information of all processors in said current and standby processor groups, respectively; a processor control section for executing a change-over operation from said current processor group to said standby processor group; an address control table for storing therein physical and logical addresses of each processor of each of said current and standby processor groups with correspondence established therebetween; a scheduling table for setting therein a faulty processor ratio determined according to an amount of job processing in said parallel processor system; a storage section for storing therein processing information of each processor reported at a predetermined point of time from a monitor table disposed in said current processor group; a change-over control section for calculating a ratio of failed processors in said current processor group at a pertinent point of time and comparing the calculated ratio with the faulty processor ratio set in the scheduling table when a report notifying occurrence of a failure in a processor is received from said monitor processor disposed in said current processor group, and executing a change-over operation of transferring the job processing from said current processor group to said standby processor group when the calculated ratio is equal to or more than the faulty processor ratio; and an operator'"'"'s console connected to said processor control section for arbitrarily setting therefrom the faulty processor ratio to the scheduling table. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A change-over control method for use with a parallel processor system including a current processor group having a plurality of processors and a network connecting the processors to each other, and a standby processor group having a plurality of processors and a network connecting the processors to each other, and said current processor group and said standby processor group both have a monitor processor disposed therein, and said parallel processor system further includes a processor control section, said method comprising the steps of:
-
determining a faulty processor ratio according to an amount of job processing in said parallel processor system; setting the faulty processor ratio in said processor control section; calculating, in said processor control section, a ratio of failed processors in said current processor group at a pertinent point of time when a report notifying an occurrence of a failure in a processor is received from the monitor processor disposed in said current processor group; comparing the ratio of failed processors calculated by said processor control section with said faulty processor ratio; and executing a change-over operation of transferring job processing from said current processor group to said standby processor group when the calculated ratio of failed processors is equal to or more than said faulty processor ratio. - View Dependent Claims (7, 8, 9, 14)
-
-
10. A change-over control method for use with a parallel processor system including a current processor group having a plurality of processors and a network connecting the processors to each other, and a standby processor group having a plurality of processors and a network connecting the processors to each other, said current processor group and said standby processor group both include a monitor processor therein, and said parallel processor system includes a processor control section having a timer therein, said method comprising the steps of:
-
determining a faulty processor ratio at a predetermined interval of time according to an amount of job processing in said parallel processor system; setting the determined faulty processor ratio in said processor control section; calculating, by said processor control section, a ratio of failed processors in said current processor group at a pertinent point of time when a report notifying an occurrence of a failure in a processor is received from said monitor processor disposed in said current processor group; comparing said calculated ratio of failed processors with said determined faulty processor ratio corresponding to a point of time indicated by said timer; and executing a change-over operation of transferring job processing from said current processor group to said standby processor group when said calculated ratio of failed processors is equal to or more than said determined faulty processor ratio. - View Dependent Claims (11, 12, 13, 15)
-
Specification