Identification of root cause for a transaction response time problem in a distributed environment
First Claim
1. A method for identifying a cause for a response time problem for a transaction in a distributed computing system that includes a central server and a plurality of subsystems, where a single transaction is divided into a plurality of sub-transactions and each sub-transaction is sent to a given subsystem of the plurality of subsystems, the method comprising:
- storing data, by an agent at each subsystem, relating to the sub-transactions performed by the subsystems, wherein the data is stored at the each subsystem in a queue that contains transaction instance data for multiple different transactions that have been processed by the each subsystem, wherein the transaction instance data is specific data generated for a given sub-transaction processed by the subsystem;
discovering a problem in connection with completion of a particular transaction;
identifying the each subsystem of the plurality of subsystems involved in the particular transaction using at least one identifier stored at the each subsystem, wherein the at least one identifier includes a first identifier uniquely identifying the particular transaction, and a second identifier identifying a policy associated with the particular transaction;
forwarding the data stored at each identified subsystem to the central server, the forwarded data including instance data relating to the sub-transaction of the particular transaction performed by the identified subsystems and aggregate data relating to sub-transactions of transactions performed by the identified subsystems; and
performing a Root-Cause Analysis using the forwarded aggregate data and the forwarded instance data to identify the subsystem that caused the response time problem with the particular transaction.
1 Assignment
0 Petitions
Accused Products
Abstract
Method and apparatus for identifying a cause for a response time problem for a transaction in a distributed computing system that includes a central server and a plurality of subsystems. Data is stored at each subsystem relating to sub-transactions of transactions performed by the subsystems. When a problem is discovered in connection with the completion of a particular transaction, each subsystem of the plurality of subsystems that was involved in the particular transaction is identified, and both instance data relating to all of the sub-transactions of the particular transaction stored at each identified subsystem and current hourly aggregate data stored at each identified subsystem is forwarded to the central server. Root-Cause Analysis is then performed using the forwarded instance data and aggregate data to identify the particular subsystem that caused the transaction problem.
-
Citations
15 Claims
-
1. A method for identifying a cause for a response time problem for a transaction in a distributed computing system that includes a central server and a plurality of subsystems, where a single transaction is divided into a plurality of sub-transactions and each sub-transaction is sent to a given subsystem of the plurality of subsystems, the method comprising:
-
storing data, by an agent at each subsystem, relating to the sub-transactions performed by the subsystems, wherein the data is stored at the each subsystem in a queue that contains transaction instance data for multiple different transactions that have been processed by the each subsystem, wherein the transaction instance data is specific data generated for a given sub-transaction processed by the subsystem; discovering a problem in connection with completion of a particular transaction; identifying the each subsystem of the plurality of subsystems involved in the particular transaction using at least one identifier stored at the each subsystem, wherein the at least one identifier includes a first identifier uniquely identifying the particular transaction, and a second identifier identifying a policy associated with the particular transaction; forwarding the data stored at each identified subsystem to the central server, the forwarded data including instance data relating to the sub-transaction of the particular transaction performed by the identified subsystems and aggregate data relating to sub-transactions of transactions performed by the identified subsystems; and performing a Root-Cause Analysis using the forwarded aggregate data and the forwarded instance data to identify the subsystem that caused the response time problem with the particular transaction. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product, comprising computer executable instructions embodied in a computer usable, recordable-type medium, for identifying a cause for a response time problem for a transaction in a distributed computing system that includes a central server and a plurality of subsystems, where a single transaction is divided into a plurality of sub-transactions and each sub-transaction is sent to a given subsystem of the plurality of subsystems, the computer program product comprising:
-
first instructions for storing data, by an agent at each subsystem, relating to the sub-transactions performed by the subsystems, wherein the data is stored at the each subsystem in a queue that contains transaction instance data for multiple different transactions that have been processed by the each subsystem, wherein the transaction instance data is specific data generated for a given sub-transaction processed by the subsystem; second instructions for discovering a problem in connection with completion of a particular transaction; third instructions for identifying the each subsystem of the plurality of subsystems involved in the particular transaction using at least one identifier stored at the each subsystem, wherein the at least one identifier includes a first identifier uniquely identifying the particular transaction, and a second identifier identifying a policy associated with the particular transaction; fourth instructions for forwarding the data stored at each identified subsystem to the central server, the forwarded data including instance data relating to the sub-transaction of the particular transaction performed by the identified subsystems and aggregate data relating to sub-transactions of transactions performed by the identified subsystems; and fifth instructions for performing a Root-Cause Analysis using the forwarded aggregate data and the forwarded instance data to identify the subsystem that caused the response time problem with the particular transaction. - View Dependent Claims (8, 9, 10)
-
-
11. An apparatus for identifying a cause for a response time problem for a transaction in a distributed computing system that includes a central server and a plurality of subsystems, where a single transaction is divided into a plurality of sub-transactions and each sub-transaction is sent to a given subsystem of the plurality of subsystems, the apparatus comprising:
-
an agent at each subsystem for storing data relating to the sub-transactions performed by the subsystems, wherein the data is stored at the each subsystem in a queue that contains transaction instance data for multiple different transactions that have been processed by the subsystem, wherein the transaction instance data is specific data generated for a given sub-transaction processed by the subsystem; a mechanism for discovering a problem in connection with completion of a particular transaction; an identifying mechanism for identifying the each subsystem of the plurality of subsystems involved in the particular transaction using at least one identifier stored at the each subsystem, wherein the at least one identifier includes a first identifier uniquely identifying the particular transaction, and a second identifier identifying a policy associated with the particular transaction; a forwarding mechanism for forwarding the data stored at each identified subsystem to the central server, the forwarded data including instance data relating to the sub-transaction of the particular transaction performed by the identified subsystems and aggregate data relating to sub-transactions of transactions performed by the identified subsystems; and an analyzer at the central server for performing a Root-Cause Analysis using the forwarded aggregate data and the forwarded instance data to identify the subsystem that caused the response time problem with the particular transaction. - View Dependent Claims (12, 13, 14, 15)
-
Specification