Model checker for finding distributed concurrency bugs
First Claim
1. A method for distributed concurrency (DC) bug detection, the method comprising:
- identifying, by a computing device, a plurality of nodes in a distributed computing cluster;
identifying, by the computing device, a plurality of messages to be transmitted during execution of an application by the distributed computing cluster;
determining, by the computing device, a set of orderings of the plurality of messages for DC bug detection, the set of orderings determined based upon the plurality of nodes and the plurality of messages;
removing, by the computing device, a subset of the orderings, where each ordering comprises a unique sequence of message arrival at one or more of the nodes, from the set of orderings based upon one or more of a state symmetry algorithm, a disjoint-update independence algorithm, and a zero-crash-impact reordering algorithm, where the zero-crash-impact reordering algorithm is a crash-after-discard reduction or a consecutive-crash reduction, and where the consecutive-crash reduction comprises determining a first message of a first ordering causes a crash of a node, determining a second message of the first ordering causes another crash of the node, and adding a second ordering comprising the first message and the second message to the subset of the orderings; and
performing, by the computing device, DC bug detection testing using the set of orderings after the subset of the orderings is removed from the set of orderings.
0 Assignments
0 Petitions
Accused Products
Abstract
Described herein are systems and methods for distributed concurrency (DC) bug detection. The method includes identifying a plurality of nodes in a distributed computing cluster; identifying a plurality of messages to be transmitted during execution of an application by the distributed computing cluster; determining a set of orderings of the plurality of messages for DC bug detection, the set of orderings determined based upon the plurality of nodes and the plurality of messages; removing a subset of the orderings from the set of orderings based upon one or more of a state symmetry algorithm, a disjoint-update independence algorithm, or a zero-crash-impact reordering algorithm; and performing DC bug detection testing using the set of orderings after the subset of the orderings is removed from the set of orderings.
47 Citations
21 Claims
-
1. A method for distributed concurrency (DC) bug detection, the method comprising:
-
identifying, by a computing device, a plurality of nodes in a distributed computing cluster; identifying, by the computing device, a plurality of messages to be transmitted during execution of an application by the distributed computing cluster; determining, by the computing device, a set of orderings of the plurality of messages for DC bug detection, the set of orderings determined based upon the plurality of nodes and the plurality of messages; removing, by the computing device, a subset of the orderings, where each ordering comprises a unique sequence of message arrival at one or more of the nodes, from the set of orderings based upon one or more of a state symmetry algorithm, a disjoint-update independence algorithm, and a zero-crash-impact reordering algorithm, where the zero-crash-impact reordering algorithm is a crash-after-discard reduction or a consecutive-crash reduction, and where the consecutive-crash reduction comprises determining a first message of a first ordering causes a crash of a node, determining a second message of the first ordering causes another crash of the node, and adding a second ordering comprising the first message and the second message to the subset of the orderings; and performing, by the computing device, DC bug detection testing using the set of orderings after the subset of the orderings is removed from the set of orderings. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A device comprising:
-
a memory storage comprising instructions; and a processor in communication with the memory, wherein the processor executes the instructions to; identify a plurality of nodes in a distributed computing cluster; identify a plurality of messages to be transmitted during execution of an application by the distributed computing cluster; determine a set of orderings of the plurality of messages for distributed concurrency (DC) bug detection, the set of orderings determined based upon the plurality of nodes and the plurality of messages; remove a subset of the orderings, where each ordering comprises a unique sequence of message arrival at one or more of the nodes, from the set of orderings based upon one or more of a state symmetry algorithm, a disjoint-update independence algorithm, and a zero-crash-impact reordering algorithm, where the zero-crash-impact reordering algorithm is a crash-after-discard reduction or a consecutive-crash reduction, and where the consecutive-crash reduction comprises determining a first message of a first ordering causes a crash of a node, determining a second message of the first ordering causes another crash of the node, and adding a second ordering comprising the first message and the second message to the subset of the orderings; and perform DC bug detection testing using the set of orderings after the subset of the orderings is removed from the set of orderings. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium storing computer instructions, that when executed by a processor, causes the processor to perform:
-
identify a plurality of nodes in a distributed computing cluster; identify a plurality of messages to be transmitted during execution of an application by the distributed computing cluster; determine a set of orderings of the plurality of messages for distributed concurrency (DC) bug detection; remove a subset of the orderings, where each ordering comprises a unique sequence of message arrival at one or more of the nodes, from the set of orderings based upon one or more of a state symmetry algorithm, a disjoint-update independence algorithm, and a zero-crash-impact reordering algorithm, where the zero-crash-impact reordering algorithm is a crash-after-discard reduction or a consecutive-crash reduction, and where the consecutive-crash reduction comprises determining a first message of a first ordering causes a crash of a node, determining a second message of the first ordering causes another crash of the node, and adding a second ordering comprising the first message and the second message to the subset of the orderings; and perform DC bug detection testing using the set of orderings after the subset of the orderings is removed from the set of orderings. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification