In-band problem log data collection between a host system and a storage system
First Claim
1. A computer program product comprising a computer useable storage medium having a computer readable program stored thereon, wherein the computer readable program, when executed on a computing device, causes the computing device to:
- receive a failure notification of a failure condition of a component of a data processing system in a network comprising a plurality of data processing systems;
receive state save data from the plurality of data processing systems in the network, wherein the state save data is generated in the plurality of data processing systems in response to an in-band state save command; and
output the state save data for use in resolving the failure condition of the component,wherein the in-band state save command is a command issued across a data channel between a first data processing system and a plurality of second data processing systems in the plurality of data processing systems, wherein the first data processing system is a storage system and the plurality of second data processing systems comprise a plurality of host systems, wherein the in-band state save command is sent from a storage controller of the storage system to the plurality of host systems in response to the failure condition being a failure of a storage device in the storage system;
wherein each of the plurality of host systems runs one or more host applications, a host bus adapter driver, and a failover driver;
wherein the in-band state save command is issued to the plurality of host systems via a multi-host system interface of the storage system which establishes a separate data channel with each of the host systems in the plurality of host systems;
wherein responsive to each of the plurality of host systems receiving the in-band state save command at the respective host bus adapter driver and providing the in-band state save command to the respective failover driver, each of the plurality of host systems collects state save data and provides the collected state save data to the storage controller via the data channels established by the multi-host system interface; and
wherein the state save data collected by each of the plurality of host systems is packaged together with state save data from the storage system into a single data package associated with the detected failure condition.
1 Assignment
0 Petitions
Accused Products
Abstract
A mechanism for in-band problem log data collection is provided. Facilities are provided for a host system, host application, or server system to instigate a state save operation in a storage system utilizing direct commands in response to an error or failure. The host system may include an application program interface (API) to force the storage device to collect a set of state save data for debug purposes at a specific time interlocked with a host system log. The API of the illustrative embodiments may be provided in a failover driver and/or host bus adapter (HBA) driver in the prime code path such that first time data capture following an error is maximized. Since the host system is instigating the state save operation with direct commands, a larger amount of transient data may be collected to provide more comprehensive state information for debugging purposes.
72 Citations
22 Claims
-
1. A computer program product comprising a computer useable storage medium having a computer readable program stored thereon, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
receive a failure notification of a failure condition of a component of a data processing system in a network comprising a plurality of data processing systems; receive state save data from the plurality of data processing systems in the network, wherein the state save data is generated in the plurality of data processing systems in response to an in-band state save command; and output the state save data for use in resolving the failure condition of the component, wherein the in-band state save command is a command issued across a data channel between a first data processing system and a plurality of second data processing systems in the plurality of data processing systems, wherein the first data processing system is a storage system and the plurality of second data processing systems comprise a plurality of host systems, wherein the in-band state save command is sent from a storage controller of the storage system to the plurality of host systems in response to the failure condition being a failure of a storage device in the storage system; wherein each of the plurality of host systems runs one or more host applications, a host bus adapter driver, and a failover driver; wherein the in-band state save command is issued to the plurality of host systems via a multi-host system interface of the storage system which establishes a separate data channel with each of the host systems in the plurality of host systems; wherein responsive to each of the plurality of host systems receiving the in-band state save command at the respective host bus adapter driver and providing the in-band state save command to the respective failover driver, each of the plurality of host systems collects state save data and provides the collected state save data to the storage controller via the data channels established by the multi-host system interface; and wherein the state save data collected by each of the plurality of host systems is packaged together with state save data from the storage system into a single data package associated with the detected failure condition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, in a computing device, for collecting data corresponding to a failure in a data processing system, comprising:
-
receiving a failure notification of a failure condition of a component of a data processing system in a network comprising a plurality of data processing systems; receiving state save data from the plurality of data processing systems in the network, wherein the state save data is generated in the plurality of data processing systems in response to an in-band state save command; and outputting the state save data for use in resolving the failure condition of the component, wherein the in-band state save command is a command issued across a data channel between a first data processing system and a plurality of second data processing systems in the plurality of data processing systems, wherein the first data processing system is a storage system and the plurality of second data processing comprise a plurality of host systems, wherein the in-band state save command is sent from a storage controller of the storage system to the plurality of host systems in response to the failure condition being a failure of a storage device in the storage system; wherein each of the plurality of host systems runs one or more host applications, a host bus adapter driver, and a failover driver; wherein the in-band state save command is issued to the plurality of host systems via a multi-host system interface of the storage system which establishes a separate data channel with each of the host systems in the plurality of host systems; wherein responsive to each of the plurality of host systems receiving the in-band state save command at the respective host bus adapter driver and providing the in-band state save command to the respective failover driver, each of the plurality of host systems collects state save data and provides the collected state save data to the storage controller via the data channels established by the multi-host system interface; and wherein the state save data collected by each of the plurality of host systems is packaged together with state save data from the storage system into a single data package associated with the detected failure condition. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A data processing system, comprising:
-
a plurality of host systems; and a storage system coupled to the plurality of host systems, wherein; one of the plurality of host systems or the storage system receives a failure notification of a failure condition of a component of the data processing system; the storage system receives first state save data from the plurality of host systems, and collects second state save data from one or more storage devices of the storage system, wherein at least one of the first state save data or the second state save data is generated in response to an in-band state save command; the storage system outputs the first state save data and second state save data for use in resolving the failure condition of the component, wherein the in-band state save command is a command issued across a data channel between the storage system and the plurality of host systems, wherein the in-band state save command is sent from a storage controller of the storage system to the plurality of host systems in response to the failure condition being a failure of a storage device in the storage system; each of the plurality of host systems runs one or more host applications, a host bus adapter driver, and a failover driver; the in-band state save command is issued to the plurality of host systems via a multi-host system interface of the storage system which establishes a separate data channel with each of the host systems in the plurality of host systems; responsive to each of the plurality of host systems receiving the in-band state save command at the respective host bus adapter and providing the in-band state save command to the respective failover driver, each of the plurality of host systems collects state save data and provides the collected state save data to the storage controller via the data channels established by the multi-host system interface; and the state save data collected by each of the plurality of host systems is packaged together with state save data from the storage system into a single data package associated with the detected failure condition. - View Dependent Claims (20, 21, 22)
-
Specification