Breakpoint logging and constraint mechanisms for parallel computing systems
First Claim
1. A system that facilitates debugging an application with parallel processes running on several machines in a computing cluster or distributed applications environment, the system comprising:
- a memory;
a plurality of machines, which are executing the application in parallel;
a plurality of application processes, running in parallel, including at least one application process running on each of the plurality of machines;
a debugger user interface engine;
a debugging engine that receives a debugging expression from the debugger user interface engine and that transmits log data and break event data back to the debugger user interface engine, wherein the received debugging expression comprises at least one of a tracepoint expression and a constraint expression, wherein the debugging engine processes the debugging expression to automatically perform a debugging process on at least two application processes of the plurality of application processes, and wherein the debugging engine comprisesa static constraint engine; and
a plurality of dynamic constraint engines that interface to the static constraint engine, wherein for each machine of the plurality of machines there is at least one corresponding dynamic constraint engine, and wherein each of the plurality of dynamic constraint engines is associated with one or more of the at least one application processes running on the machine;
wherein when a user creates the debugging expression via the debugger user interface, the debugging expression is sent directly to the static constraint engine;
wherein the static constraint engine receives and parses the debugging expression, reducing the expression by evaluating parts of the expression based on static values, extracting static information, and treating asambiguous any term that relies on a variable whose value is not yet known,wherein the static constraint engine automatically generates, for the received debugging expression, breakpoint information corresponding to a breakpoint or a tracepoint, wherein the breakpoint information includes may-break/must-break/must-not-break information, and the static constraint engine forwards the breakpoint information to at least one selected dynamic constraint engine of the plurality of dynamic constraint engines when it is determined that the breakpoint or tracepoint is reachable by the one or more application processes associated with the selected dynamic constraint engine, such that the static constraint engine does not forward the breakpoint information to the selected dynamic constraint engine when the breakpoint or tracepoint is unreachable by the one or more associated application processes, and further wherein the static constraint engine forwards the breakpoint information to the selected dynamic constraint engine when interruption to the running application, as a whole, is below a predetermined interruption value, andwherein the selected dynamic constraint engine, upon receiving the breakpoint information from the static constraint engine, registers the corresponding breakpoint or tracepoint at one or more applicable locations in at least one of the one or more associated application processes, andwherein, the selected dynamic constraint engine, upon receiving a breakpoint event from one of the one or more associated application processes, evaluates dynamically the breakpoint event in light of the breakpoint information, wherein the selected dynamic constraint engine dynamically performs one of (i) applying the tracepoint to minimize a stop time associated with stopping the application process, sending a log back to the user interface engine for at least presentation to the user and storage, or (ii) applying the breakpoint and returning break event information to the user interface engine for presentation to the user.
2 Assignments
0 Petitions
Accused Products
Abstract
A system that facilitates debugging of a computing cluster and/or distributed applications environment. A debugger component receives a debugging expression, and a constraint component includes both a static constraint engine (SCE) and a dynamic constraint engine (DCE) processes the debugging expression to automatically perform a debugging process on at least two processes of a plurality of processes. When the user creates a tracepoint or constraint breakpoint the expression is sent directly to the SCE, which parses the constraint and tracepoint expressions, reduces the expression by evaluating parts of the expression based on static values (such as process ID or filename), and passes the remainder on to each of the applicable DCEs. The DCEs register a breakpoint at the applicable location in the process, and upon receiving a breakpoint event, evaluates the remainder of the constraint expression on the dynamic data, and sends log and/or break event data back to the user for viewing.
-
Citations
22 Claims
-
1. A system that facilitates debugging an application with parallel processes running on several machines in a computing cluster or distributed applications environment, the system comprising:
-
a memory; a plurality of machines, which are executing the application in parallel; a plurality of application processes, running in parallel, including at least one application process running on each of the plurality of machines; a debugger user interface engine; a debugging engine that receives a debugging expression from the debugger user interface engine and that transmits log data and break event data back to the debugger user interface engine, wherein the received debugging expression comprises at least one of a tracepoint expression and a constraint expression, wherein the debugging engine processes the debugging expression to automatically perform a debugging process on at least two application processes of the plurality of application processes, and wherein the debugging engine comprises a static constraint engine; and a plurality of dynamic constraint engines that interface to the static constraint engine, wherein for each machine of the plurality of machines there is at least one corresponding dynamic constraint engine, and wherein each of the plurality of dynamic constraint engines is associated with one or more of the at least one application processes running on the machine; wherein when a user creates the debugging expression via the debugger user interface, the debugging expression is sent directly to the static constraint engine; wherein the static constraint engine receives and parses the debugging expression, reducing the expression by evaluating parts of the expression based on static values, extracting static information, and treating as ambiguous any term that relies on a variable whose value is not yet known, wherein the static constraint engine automatically generates, for the received debugging expression, breakpoint information corresponding to a breakpoint or a tracepoint, wherein the breakpoint information includes may-break/must-break/must-not-break information, and the static constraint engine forwards the breakpoint information to at least one selected dynamic constraint engine of the plurality of dynamic constraint engines when it is determined that the breakpoint or tracepoint is reachable by the one or more application processes associated with the selected dynamic constraint engine, such that the static constraint engine does not forward the breakpoint information to the selected dynamic constraint engine when the breakpoint or tracepoint is unreachable by the one or more associated application processes, and further wherein the static constraint engine forwards the breakpoint information to the selected dynamic constraint engine when interruption to the running application, as a whole, is below a predetermined interruption value, and wherein the selected dynamic constraint engine, upon receiving the breakpoint information from the static constraint engine, registers the corresponding breakpoint or tracepoint at one or more applicable locations in at least one of the one or more associated application processes, and wherein, the selected dynamic constraint engine, upon receiving a breakpoint event from one of the one or more associated application processes, evaluates dynamically the breakpoint event in light of the breakpoint information, wherein the selected dynamic constraint engine dynamically performs one of (i) applying the tracepoint to minimize a stop time associated with stopping the application process, sending a log back to the user interface engine for at least presentation to the user and storage, or (ii) applying the breakpoint and returning break event information to the user interface engine for presentation to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a computing cluster or distributed applications environment that includes a plurality of processing units with corresponding memories, a method of debugging a parallel application that is running with a plurality of parallel application processes across a plurality of machines, including at least one application process on each machine, the method comprising:
-
generating, at a debugger user interface engine, a debugging expression that includes at least one of a tracepoint expression corresponding to a tracepoint and a breakpoint expression corresponding to a breakpoint; receiving the debugging expression at a debugging engine, wherein the debugging engine comprises a static constraint engine and a plurality of dynamic constraint engines that interface to the static constraint engine, wherein for each machine of the plurality of machines there is at least one corresponding dynamic constraint engine of the plurality of dynamic constraint engines, and wherein each dynamic constraint engine is associated with one or more of the at least one application processes running on the machine; processing the debugging expression at the static constraint engine, wherein said processing includes extracting static information and non-static information from the debugging expression, wherein static information defines select ones of the plurality of machines on which a debugging operation is to be performed; treating as ambiguous any term in the debugging expression that relies on a variable whose value is not yet known; and generating breakpoint information for the debugging expression corresponding to at least one of the breakpoint and the tracepoint, including may-break/must-break/must-not-break information; forwarding, for each machine of the select ones of the machines, the breakpoint information from the static constraint engine to at least one selected dynamic constraint engine corresponding to the machine, when it is determined (i) that the breakpoint or the tracepoint is reachable by the one or more application processes associated with the selected dynamic constraint engine, such that breakpoint information is not forwarded to the selected dynamic constraint engine when the breakpoint or tracepoint is unreachable by the one or more associated application processes, and (ii) that interruption to the running application, as a whole, is below a predetermined interruption value; and performing, in parallel, the debugging operation on a plurality of application processes corresponding to the select ones of the machines. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer program product, comprising one or more computer-readable computer storage media, that when executed by one or more processors of one or more computing systems, causes the one or more computing systems to perform at least the following:
-
generate, at a debugger user interface engine, a debugging expression that includes at least one of a tracepoint expression corresponding to a tracepoint and a breakpoint expression corresponding to a breakpoint; receive the debugging expression at a debugging engine, wherein the debugging engine comprises a static constraint engine and a plurality of dynamic constraint engines that interface to the static constraint engine, wherein for each machine of the plurality of machines there is at least one corresponding dynamic constraint engine of the plurality of dynamic constraint engines, and wherein each dynamic constraint engine is associated with one or more of the at least one application processes running on the machine; process the debugging expression at the static constraint engine, wherein said processing includes extracting static information and non-static information from the debugging expression, wherein static information defines select ones of the plurality of machines on which a debugging operation is to be performed; treating as ambiguous any term in the debugging expression that relies on a variable whose value is not yet known; and generating breakpoint information for the debugging expression corresponding to at least one of the breakpoint and the tracepoint, including may-break/must-break/must-not-break information; forward, for each machine of the select ones of the machines, the breakpoint information from the static constraint engine to at least one selected dynamic constraint engine corresponding to the machine when it is determined (i) that the breakpoint or the tracepoint is reachable by the one or more application processes associated with the selected dynamic constraint engine, such that breakpoint information is not forwarded to the selected dynamic constraint engine when the breakpoint or tracepoint is unreachable by the one or more associated application processes, and (ii) that interruption to the running application, as a whole, is below a predetermined interruption value; and perform, in parallel, the debugging operation on a plurality of application processes corresponding to the select ones of the machines.
-
Specification