Method and infrastructure for detecting and/or servicing a failing/failed operating system instance
First Claim
Patent Images
1. A diagnosis mechanism for a computer system, comprising:
- an auxiliary service system capable of gaining access to resources of a failed or failing operating system running on a computer system capable of supporting a plurality of concurrently-running operating system instances, each said operating system instance owning or sharing one or more processing elements, a certain amount of memory, and one or more input/output (I/O) devices, and automatically diagnosing the failed or failing operating without affecting a functioning of other instances of operating system concurrently operating on said computer system,wherein said computer system comprises a Symmetric Multi-Processor (SMP), wherein said concurrently-running operating system instances and said auxiliary service system each occupies a respective Logical PARtition (LPAR), said auxiliary service system comprises an ambulance service LPAR, and said LPARs are controlled by a hypervisor,wherein each said operating system instance, during an initial booting procedure, prepares a memory region for storing metadata of its kernel and non-kernel data structures and other information that can potentially assist said auxiliary service system during a failure of said operating system instance, said operating system instance providing a location of said memory region to said hypervisor before beginning a normal course of operation after said initial booting procedure,machine-readable instructions permitting said auxiliary service system to identify locations of at least one of kernel structures and other data structures in a distressed operating system,wherein said auxiliary system further comprises at least one data structure validator that validates values of said at least one of kernel structures and data structures.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and infrastructure for a diagnosis and/or repair mechanism in a computer system, that includes an auxiliary service system running on the computer system.
54 Citations
25 Claims
-
1. A diagnosis mechanism for a computer system, comprising:
-
an auxiliary service system capable of gaining access to resources of a failed or failing operating system running on a computer system capable of supporting a plurality of concurrently-running operating system instances, each said operating system instance owning or sharing one or more processing elements, a certain amount of memory, and one or more input/output (I/O) devices, and automatically diagnosing the failed or failing operating without affecting a functioning of other instances of operating system concurrently operating on said computer system, wherein said computer system comprises a Symmetric Multi-Processor (SMP), wherein said concurrently-running operating system instances and said auxiliary service system each occupies a respective Logical PARtition (LPAR), said auxiliary service system comprises an ambulance service LPAR, and said LPARs are controlled by a hypervisor, wherein each said operating system instance, during an initial booting procedure, prepares a memory region for storing metadata of its kernel and non-kernel data structures and other information that can potentially assist said auxiliary service system during a failure of said operating system instance, said operating system instance providing a location of said memory region to said hypervisor before beginning a normal course of operation after said initial booting procedure, machine-readable instructions permitting said auxiliary service system to identify locations of at least one of kernel structures and other data structures in a distressed operating system, wherein said auxiliary system further comprises at least one data structure validator that validates values of said at least one of kernel structures and data structures. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An automated method of diagnosis in a computer system capable of supporting a plurality of operating system instances concurrently operating, each said operating system instance owning or sharing one or more processing elements, a certain amount of memory, and one or more input/output (I/O) devices, said method comprising:
-
running an auxiliary service system on said computer system, said auxiliary service system capable of automatically diagnosing an operating system instance running on said computer system without affecting an operation of remaining operating system instances, wherein said computer system comprises a Symmetric Multi-Processor (SMP), wherein said plurality of operating system instances and said auxiliary service system each occupies a respective Logical PARtition (LPAR), said auxiliary service system comprises an ambulance service LPAR, and said LPARs are controlled by a hypervisor, wherein each said operating system instance, during an initial booting procedure, prepares a memory region for storing metadata of its kernel and non-kernel data structures and other information that can potentially assist said auxiliary service system during a failure of said operating system instance, said operating system instance providing a location of said memory region to said hypervisor before beginning a normal course of operation after said initial booting procedure; and executing machine-readable instructions, upon detecting that an operating system instance has failed or potentially will fail, permitting said auxiliary service system to identify locations of at least one of kernel structures and other data structures in said failed or potentially failing operating system instance, wherein said auxiliary service system further comprises at least one data structure validator, said method further comprising using said data structures to validate values of said at least one of kernel structures and data structures. - View Dependent Claims (15, 16)
-
-
17. A tangible signal-bearing storage medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform an automated method of diagnosis in a computer system, said method comprising:
-
running an auxiliary service system on said computer system capable of concurrently operating a plurality of operating system instances, said auxiliary service system capable of diagnosing a failed or failing operating system instance running on said computer system without affecting an operation of remaining instances of said plurality of operating system instances, wherein said computer system comprises a Symmetric Multi-Processor (SMP), wherein said plurality of operating system instances and said auxiliary service system each occupies a respective Logical PARtition (LPAR), said auxiliary service system comprises an ambulance service LPAR, and said LPARs are controlled by a hypervisor, wherein each said operating system instance, during an initial booting procedure, prepares a memory region for storing metadata of its kernel and non-kernel data structures and other information that can potentially assist said auxiliary service system during a failure of said operating system instance, said operating system instance providing a location of said memory region to said hypervisor before beginning a normal course of operation after said initial booting procedure; and permitting said auxiliary service system to identify locations of at least one of kernel structures and other data structures in a distressed operating system, wherein said auxiliary system further comprises at least one data structure validator that validates values of said at least one of kernel structures and data structures. - View Dependent Claims (18, 19, 20)
-
-
21. A computer system, comprising:
-
means for setting up and maintaining a plurality of concurrently-running operating system instances, each said operating system instance owning or sharing one or more processing elements, a certain amount of memory, and one or more input/output (I/O) devices; and an automatic diagnosis mechanism capable of at least one of diagnosing an operating instance running on said computer system, repairing an operating system instance running on said computer system, annunciating that an operating system instance running on said computer system is requesting servicing, and providing a report of diagnosing said operating system instance, without affecting an operation of any other instances of said concurrently-running operating system instances, wherein said computer system comprises a Symmetric Multi-Processor (SMP), wherein each of said plurality of concurrently-running system instances and said automatic diagnosis mechanism occupies a respective Logical PARtition (LPAR), said auxiliary service system comprises an ambulance service LPAR, and said LPARs are controlled by a hypervisor, wherein each said operating system instance, during an initial booting procedure, prepares a memory region for storing metadata of its kernel and non-kernel data structures and other information that can potentially assist said auxiliary service system during a failure of said operating system instance, said operating system instance providing a location of said memory region to said hypervisor before beginning a normal course of operation after said initial booting procedure, said automatic diagnosis mechanism comprising machine-readable instructions permitting said auxiliary service system to identify locations of at least one of kernel structures and other data structures in a distressed operating system, wherein said auxiliary system further comprises at least one data structure validator that validates values of said at least one of kernel structures and data structures. - View Dependent Claims (22)
-
-
23. A method of at least one of operating a data center and using a data center, said method comprising:
-
at least one of transmitting data to a computer system in said data center and receiving data from said computer system, wherein said computer system in said data center has provisions to set up and maintain a plurality of concurrently-running operating system instances, each said operating system instance owning or sharing one or more processing elements, a certain amount of memory, and one or more input/output (I/O) devices, said computer system further comprising an automatic diagnosis mechanism capable of at least one of diagnosing an operating instance running on said computer system, repairing an operating system instance running on said computer system, annunciating that an operating system instance running on said computer system is requesting servicing, and providing a report of diagnosing said operating system instance, without affecting an operation of remaining instances of said plurality of concurrently-running operating system instances, said automatic diagnosis mechanism comprising machine-readable instructions permitting said auxiliary service system to identify locations of at least one of kernel structures and other data structures in a distressed operating system, wherein said auxiliary system further comprises at least one data structure validator that validates values of said at least one of kernel structures and data structures; wherein said computer system comprises a Symmetric Multi-Processor (SMP), wherein said plurality of concurrently-running operating system instances and said auxiliary service system each occupies a respective Logical PARtition (LPAR), said auxiliary service system comprises an ambulance service LPAR, and said LPARs are controlled by a hypervisor, wherein each said operating system instance, during an initial booting procedure, prepares a memory region for storing metadata of its kernel and non-kernel data structures and other information that can potentially assist said auxiliary service system during a failure of said operating system instance, said operating system instance providing a location of said memory region to said hypervisor before beginning a normal course of operation after said initial booting procedure. - View Dependent Claims (24, 25)
-
Specification