Distributed resource contention detection and handling
First Claim
Patent Images
1. A method comprising:
- tracking, by a processing device, lock requests for a resource shared by a plurality of nodes of a cluster file system, wherein each node of the cluster file system has a local queue, and wherein the respective local queue of each node tracks lock requests for the resource;
maintaining, by the processing device, a plurality of request queue lengths for the local queues of the nodes, wherein a queue length of the plurality of request queue lengths for the local queue associated with the corresponding node comprises a number of lock requests queued for the local node;
determining, by the processing device, a measure of congestion for the resource, wherein the measure of congestion is in view of an average queue length of the plurality of request queue lengths;
determining, by the processing device, a relative congestion factor for a first node of the plurality of nodes; and
adjusting, by the processing device, a relative lock hold time for the first node of the cluster file system in view of the measure of congestion and the relative congestion factor for the first node.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method are disclosed for detecting and handling resource contention in a cluster file system. In one implementation, a processing device determines a measure of congestion for a resource that is shared by a first node of a cluster file system and a second node of the cluster file system, where the first node has a first local queue for lock requests for the resource, and wherein the second node has a second local queue for lock requests for the resource. The processing device adjusts a parameter for a node of the cluster file system in view of the measure of congestion.
36 Citations
18 Claims
-
1. A method comprising:
-
tracking, by a processing device, lock requests for a resource shared by a plurality of nodes of a cluster file system, wherein each node of the cluster file system has a local queue, and wherein the respective local queue of each node tracks lock requests for the resource; maintaining, by the processing device, a plurality of request queue lengths for the local queues of the nodes, wherein a queue length of the plurality of request queue lengths for the local queue associated with the corresponding node comprises a number of lock requests queued for the local node; determining, by the processing device, a measure of congestion for the resource, wherein the measure of congestion is in view of an average queue length of the plurality of request queue lengths; determining, by the processing device, a relative congestion factor for a first node of the plurality of nodes; and adjusting, by the processing device, a relative lock hold time for the first node of the cluster file system in view of the measure of congestion and the relative congestion factor for the first node. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus comprising:
-
a memory to store a lock for a resource that is shared by a first node of a cluster file system and a second node of the cluster file system, wherein the first node has a first local queue for lock requests for the resource, and wherein the second node has a second local queue for lock requests for the resource; and a processing device, operatively coupled to the network interface, to; maintain a first local queue length of the first local queue and a second local queue length of the second local queue, wherein the first queue length for the first local queue is a number of lock requests queued for the first local node, and the second queue length for the second local queue is a number of lock requests queued for the second local node; determine a measure of congestion for the resource, wherein the measure of congestion is determined in view of an average of the first local queue length and second local queue length; determine a relative congestion factor for the first node; and adjust a parameter for the first node of the cluster file system in view of the measure of congestion and the relative congestion factor for the first node. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable storage medium having instructions stored thereon that, when executed by a processing device, cause the processing device to:
-
maintain a first local queue length of a first local queue and a second local queue length of a second local queue, wherein the first queue length for the first local queue is a number of lock requests queued for the first local node, and the second queue length for the second local queue is a number of lock requests queued for the second local node; determine, by the processing device, a measure of congestion for a resource that is shared by a first node of a cluster file system and a second node of the cluster file system, wherein the first node has the first local queue for lock requests for the resource, wherein the second node has the second local queue for lock requests for the resource, and wherein the measure of congestion is in view of an average of the first local queue length and second local queue length; determine a relative congestion factor for the first node; and adjust a parameter for the first node of the cluster file system in view of the measure of congestion and the relative congestion factor for the first node. - View Dependent Claims (15, 16, 17, 18)
-
Specification