Distributed metering and monitoring system

US 9,432,462 B2
Filed: 12/17/2015
Issued: 08/30/2016
Est. Priority Date: 06/13/2011
Status: Active Grant

First Claim

Patent Images

1. A method for using a cluster for metering and monitoring a distributed system, comprising:

assigning a plurality of nodes to the cluster;

assigning a node memory to the cluster;

obtaining metrics data for the plurality of nodes and cluster state data for the cluster;

storing the cluster state data and the metrics data into the node memory;

creating a message queue for an instance based on a user request;

identifying a number of nodes required for the instance to satisfy the user request, with the instance comprising at least 10,000 nodes;

determining a size of the cluster based on performance capacity of the plurality of nodes assigned to the cluster;

determining a ratio of the message queue to the cluster by dividing the number of nodes in the instance into a plurality of clusters and assigning the plurality of clusters to the message queue; and

retrieving the metrics data and the cluster state data from the node memory of the cluster for the instance by using the message queue according to the determined ratio.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The distributed metering and monitoring service (DMMS) system provides a way to gather and maintain metrics data which remains distributed, until requested. The DMMS system uses messaging queues to scale the number of servers that may be monitored and metered to a hyperscale of greater than 10,000 servers. The DMMS system determines how many servers (nodes) to assign to a cluster, and uses a metric aggregator to collect and store metrics data for the nodes. The DMMS system creates message queues for the instances, injects instance identifiers into the cluster state data and metrics data, listens for request messages for metering information for instances, retrieves the metrics data for users identified by the instance identifiers stored locally at the nodes, and calculates the metering information for the instance.

Citations

20 Claims

1. A method for using a cluster for metering and monitoring a distributed system, comprising:
- assigning a plurality of nodes to the cluster;
  
  assigning a node memory to the cluster;
  
  obtaining metrics data for the plurality of nodes and cluster state data for the cluster;
  
  storing the cluster state data and the metrics data into the node memory;
  
  creating a message queue for an instance based on a user request;
  
  identifying a number of nodes required for the instance to satisfy the user request, with the instance comprising at least 10,000 nodes;
  
  determining a size of the cluster based on performance capacity of the plurality of nodes assigned to the cluster;
  
  determining a ratio of the message queue to the cluster by dividing the number of nodes in the instance into a plurality of clusters and assigning the plurality of clusters to the message queue; and
  
  retrieving the metrics data and the cluster state data from the node memory of the cluster for the instance by using the message queue according to the determined ratio.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the size of the cluster is determined based on a number of metrics data to be obtained, a frequency of obtaining the metrics data and a geographic location of the metrics data to be obtained.
  - 3. The method of claim 1, further comprising:
    - assigning a number of nodes to each of the multiple clusters, wherein the number of nodes varies according to an aggregate performance of the assigned number of nodes.
  - 4. The method of claim 1, further comprising:
    - defining, according to the cluster, node identifiers for the plurality of nodes assigned to the cluster;
      
      storing, by a metric aggregator in the node memory, the node identifiers for the plurality of nodes assigned to the cluster in the node memory;
      
      collecting, by the metric aggregator, the cluster state data that identifies a state of the cluster, and collecting the metrics data for the plurality of nodes assigned to the cluster; and
      
      propagating the cluster state data and the metrics data to each other node assigned to the cluster.
  - 5. The method of claim 4, further comprising:
    - monitoring, by the metric aggregator, the plurality of nodes assigned to the cluster to collect node level information and instance information when a node from the plurality of nodes assigned to the cluster links to the instance.
  - 6. The method of claim 4, further comprising:
    - creating the message queue for each instance by using an instance identifier and injecting the instance identifier into the cluster state data and the metrics data, wherein the instance identifier identifies the assigned nodes identified by the node identifiers.
  - 7. The method of claim 6, further comprising:
    - listening for receipt of request messages for metering information;
      
      retrieving the metrics data from the node memory for the cluster; and
      
      calculating the metering information and generating a reply message comprising the metering information that is identified by the instance identifier.
  - 8. The method of claim 5, further comprising:
    - assigning one of the plurality of nodes assigned to the cluster as a primary controller of the metrics aggregator to monitor the plurality of nodes assigned to the cluster, and assigning the other nodes assigned to the cluster to operate as backup controllers for the metrics aggregator.
  - 9. The method of claim 1, further comprising:
    - calculating the performance capacity of the cluster by using a cluster-to-node ratio threshold value; and
      
      identifying a number of nodes assigned to the cluster that satisfy the cluster-to-node ratio threshold value.
  - 10. The method of claim 9, further comprising:
    - performing a test to establish the performance capacity of the cluster by testing variable test parameters for the number of nodes assigned to the cluster that are identified as satisfying the cluster-to-node ratio threshold value, a number of metrics data to be collected and a frequency to collect the metrics data; and
      
      establishing the cluster-to-node ratio threshold by performing the test.

11. A system comprising:
- a cluster for metering and monitoring the system;
  
  a plurality of nodes that are assigned to the cluster, wherein each of the plurality nodes assigned to the cluster has a node memory that is assigned to the cluster;
  
  a metric aggregator that collects metrics data for the plurality of nodes assigned to the cluster and collects cluster state data that are obtained for the cluster, wherein the cluster state data and the metrics data are stored in the node memory;
  
  a processor;
  
  a memory; and
  
  instructions stored in the memory when executed cause the processor to;
  
  determine a size of the cluster based on performance capacity of the plurality of nodes assigned to the cluster, and create a message queue for an instance based on a user request, wherein a number of nodes required for the instance are identified to satisfy the user request, with the instance comprising at least 10,000 nodes, anddetermine a ratio of the message queue to the cluster by dividing the number of nodes in the instance into multiple clusters, wherein the message queue is assigned to the multiple clusters, and the metrics data and the cluster state data are retrieved from the node memory of the cluster for the instance by using the message queue according to the determined ratio.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, wherein the size of the cluster is determined based on a number of metrics data to be collected, a frequency of collecting the metrics data and a geographic location of the metrics data to be collected.
  - 13. The system of claim 11, wherein each of multiple clusters is assigned a number of nodes, wherein the number of nodes varies according to an aggregate performance of the number of nodes.
  - 14. The system of claim 11, wherein the cluster further defines node identifiers for the plurality of nodes assigned to the cluster, wherein the node identifiers for the plurality of nodes assigned to the cluster are stored in the node memory by the metric aggregator, and the metric aggregator collects the cluster state data that identifies a state of the cluster and the metrics data for the plurality of nodes assigned to the cluster and propagates the cluster state data and the metrics data to each other node assigned to the cluster.
  - 15. The system of claim 14, wherein the metric aggregator monitors the nodes assigned to the cluster to collect node level information and instance information when a node from the plurality of nodes assigned to the cluster links to the instance.
  - 16. The system of claim 14, wherein the message queue is created for each instance by using an instance identifier, wherein the instance identifier is injected into the cluster state data and the metrics data, and the instance identifier identifies the assigned nodes identified by the node identifiers.
  - 17. The system of claim 16, wherein the system further comprises a monitor that listens for receipt of request messages for metering information, and upon the receipt of the request messages, the message queue retrieves the metrics data from the node memory for the cluster, and the metering information is calculated by a metering logic and is stored in the node memory and the metering information is included in a reply message that is generated by a hypervisor of the system and is identified by using the instance identifier.
  - 18. The system of claim 15, wherein the one of the plurality of nodes assigned to the cluster is assigned as a primary controller of the metrics aggregator to monitor the plurality of nodes assigned to the cluster, and the other nodes assigned to the cluster are assigned to operate as backup controllers for the metrics aggregator.
  - 19. The system of claim 11, wherein the system further comprises instructions when executed cause the processor to calculate the performance capacity of the cluster by using a cluster-to-node ratio threshold value and identify a number of nodes assigned to the cluster that satisfy the cluster-to-node ratio threshold value.
  - 20. The system of claim 19, wherein the system further comprises instructions when executed cause the processor to perform a test to establish the performance capacity of the cluster by testing variable test parameters for the number of nodes assigned to the cluster that are identified as satisfying the cluster-to-node ratio threshold value, a number of metrics data to be collected and a frequency to collect the metrics data, and establish the cluster-to-node ratio threshold by performing the test.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Accenture Global Services Limited (Accenture PLC)
Original Assignee
Accenture Global Services Limited (Accenture PLC)
Inventors
Richter, Owen E, Lundell, Lukas M, Portell, Charles M, Walker, Bryan M, Parthasarathy, Sonali
Primary Examiner(s)
Joo, Joshua

Application Number

US14/972,694
Publication Number

US 20160105512A1
Time in Patent Office

257 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 11/3495   for systems

G06F 2201/815   Virtual

G06F 2209/505   Clust

G06F 9/5061   Partitioning or combining o...

G06Q 10/06   Resources, workflows, human...

H04L 43/08   Monitoring or testing based...

H04L 67/1097   for distributed storage of ...

H04L 67/52   specially adapted for the l...

Distributed metering and monitoring system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed metering and monitoring system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links