Fault-tolerant monitoring apparatus, method and system
First Claim
1. A fault-tolerant monitoring apparatus arranged to monitor physical performance properties of a plurality of networked computing elements, each computing element comprising a processing unit and individual memory to store local information from the computing element itself and remote information from other computing elements, the monitoring apparatus comprisinga plurality of measurer apparatuses, each arranged to measure the physical performance properties of a single computing element among the plurality of computing elements, the physical performance properties being stored as local information in the individual memory of the single computing element in which the measurement is made;
- andat least one collector apparatus,each measurer apparatus, among the plurality of measurer apparatuses, being directly linked to a single collector apparatus among the at least one collector apparatus without involvement of a network interface controller to signal the collector apparatus with local information,each collector apparatus arranged to control,collection of remote information representing the physical performance properties from the individual memory in a plurality of the computing elements, andstorage of the remote information as replicate information in the individual memory of another computing element;
wherein the remote information is collected by a computing element other than the computing element from which the remote information is collected using third party access, and the remote information is stored as replicate information in the individual memory of the computing element which collected the remote information, or another computing element other than a computing element from which the remote information is collected using the third party access;
wherein the physical performance properties are in form of electrical characteristics including any of voltage, current power or energy use of the computing element or part thereof; and
wherein the third party access is by one of remote direct memory access (RDMA) put and RDMA get.
1 Assignment
0 Petitions
Accused Products
Abstract
A fault-tolerant monitoring apparatus is arranged to monitor physical performance properties of a plurality of networked computing elements, each element including a processing unit and individual memory. The monitoring apparatus comprises a plurality of measurer apparatuses, each arranged to measure the physical performance properties of a single computing element, the physical performance properties being stored as local information in the individual memory of the computing element in which the measurement is made; and one or more collector apparatuses arranged to control collection of remote information representing physical performance properties from individual memory in a plurality of the computing elements; and storage of the remote physical performance information as replicate information in the individual memory of another computing element; wherein the remote physical performance information is collected using third party access.
-
Citations
15 Claims
-
1. A fault-tolerant monitoring apparatus arranged to monitor physical performance properties of a plurality of networked computing elements, each computing element comprising a processing unit and individual memory to store local information from the computing element itself and remote information from other computing elements, the monitoring apparatus comprising
a plurality of measurer apparatuses, each arranged to measure the physical performance properties of a single computing element among the plurality of computing elements, the physical performance properties being stored as local information in the individual memory of the single computing element in which the measurement is made; - and
at least one collector apparatus, each measurer apparatus, among the plurality of measurer apparatuses, being directly linked to a single collector apparatus among the at least one collector apparatus without involvement of a network interface controller to signal the collector apparatus with local information, each collector apparatus arranged to control, collection of remote information representing the physical performance properties from the individual memory in a plurality of the computing elements, and storage of the remote information as replicate information in the individual memory of another computing element; wherein the remote information is collected by a computing element other than the computing element from which the remote information is collected using third party access, and the remote information is stored as replicate information in the individual memory of the computing element which collected the remote information, or another computing element other than a computing element from which the remote information is collected using the third party access; wherein the physical performance properties are in form of electrical characteristics including any of voltage, current power or energy use of the computing element or part thereof; and wherein the third party access is by one of remote direct memory access (RDMA) put and RDMA get. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- and
-
12. A fault-tolerant monitoring method for monitoring physical performance properties of a plurality of networked computing elements, each computing element including a processing unit and an individual memory to store local information from the computing element itself and remote information from other computing elements, the monitoring method comprising
measuring the physical performance properties of corresponding computing elements using measurer apparatuses and storing local information representing the physical performance properties in the individual memory of the corresponding computing elements; - and
by at least one collector apparatus, collecting remote information representing the physical performance properties from the individual memory in a particular computing element among the computing elements, and storing the remote information as replicate information in the individual memory of another computing element; wherein each measurer apparatus, among the measurer apparatuses, is directly linked to a single collector apparatus among the at least one collector apparatus without involvement of a network interface controller to signal the collector apparatus with local information; wherein the remote information is collected by a computing element other than the computing element from which the remote information is collected using third party access, and the remote information is stored as replicate information in the individual memory of the computing element which collected the remote information, or another computing element other than the computing element from which the remote information is collected, using the third party access; wherein the physical performance properties are in form of electrical characteristics including any of voltage, current, power or energy use of the computing element or part thereof; and wherein the third party access is by one of remote direct memory access (RDMA) put and RDMA get. - View Dependent Claims (13)
- and
-
14. A non-transitory computer-readable medium tangibly embodying a computer program, which when loaded onto a distributed memory computer system with a plurality of networked computing elements, each computing element including a processing unit and an individual memory to store local information from the computing element itself and remote information from other computing elements, configures the distributed memory computer system to:
-
measure physical performance properties of corresponding computing elements using measurer apparatuses and store local information representing the physical performance properties in individual memory of the corresponding computing elements; and using at least one collector apparatus, collect remote information representing the physical performance properties from the individual memory in a particular computing element among the computing elements, and store the remote information as replicate information in the individual memory of another computing element; wherein each measurer apparatus, among the measurer apparatuses, is directly linked to a single collector apparatus among the at least one collector apparatus without involvement of a network interface controller to signal the collector apparatus with local Information; wherein the remote information is collected by a computing element other than the computing element from which the remote information is collected using third party access, and the remote information is stored as replicate information in the individual memory of the computing element which collected the remote information, or another computing element other than the computing element from which the remote information is collected, using the third party access; wherein the physical performance properties are in form of electrical characteristics including any of voltage, current, power or energy use of the computing element or part thereof; and wherein the third party access is by one of remote direct memory access (ROMA) put and RDMA get.
-
-
15. A computer system comprising a plurality of networked computing elements, each computing element including a processing unit and individual memory to store local information from the computing element itself and remote information from other computing elements, the computer system including a fault-tolerant monitoring apparatus arranged to monitor physical performance properties of the networked computing elements, the monitoring apparatus comprising:
-
a plurality of measurer apparatuses each arranged to measure the physical performance properties of a single computing element among the plurality of computing elements, for storage as local information in the individual memory of the particular single computing element; and at least one collector apparatus, each measurer apparatus, among the plurality of measurer apparatuses, being directly linked to a single collector apparatus among the at least one collector apparatus without involvement of a network interface controller to signal the collector apparatus with local information; each collector apparatus arranged to control, collection of remote information representing the physical performance properties from individual memory in a plurality of the computing elements, and storage of the remote information as replicate information in the individual memory of another computing element; wherein the remote information is collected by a computing element other than the computing element from which the remote information is collected using third party access, and the remote information is stored as replicate information in the individual memory of the computing element which collected the remote Information, or another computing element other than the computing element from which the remote information is collected, using the third party access; wherein the physical performance properties are in form of electrical characteristics including any of voltage, current, Dower or energy use of the computing element or part thereof; and wherein the third party access is by one of remote direct memory access (RDMA) put and RDMA get.
-
Specification