USAGE AND BANDWIDTH UTILIZATION COLLECTION MECHANISM FOR A DISTRIBUTED STORAGE SYSTEM

US 20160259834A1
Filed: 03/03/2015
Published: 09/08/2016
Est. Priority Date: 03/03/2015
Status: Active Grant

First Claim

Patent Images

1. A method for collecting utilization data for a plurality of accounts in a distributed storage cluster including a controller and a plurality of nodes, each node coupled to a plurality of drives, the method comprising:

for each of the nodes, collecting storage utilization data for a one account by;

extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed;

generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file;

storing the raw storage data file on the given node;

locating, by the given node periodically searching for, the raw storage data file on the given node;

upon the raw storage data file being located;

copying the raw storage data file to the cluster until the raw storage data file is successfully copied to the cluster;

upon successfully copying the raw storage data file to the cluster, marking the raw storage data file as copied;

uploading the raw storage data file to the controller until the raw storage data file is successfully uploaded to the controller; and

upon successfully uploading the raw storage data file to the controller, deleting the raw storage data file.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique is introduced that enables one or more mechanisms to collect storage and transfer utilization metrics for an account in a distributed data storage system that are more scalable and robust than conventional ways. The technique includes a method comprising, for each of the nodes, of collecting storage utilization data for a one account. The method further includes, for each proxy server, collecting transfer utilization data for the one account. The method further includes, at the controller, generating a cluster account interval (CAI) data based on a raw storage data file and an aggregated transfer data file. The CAI data include storage and transfer utilization data over a predetermined time span.

Citations

26 Claims

1. A method for collecting utilization data for a plurality of accounts in a distributed storage cluster including a controller and a plurality of nodes, each node coupled to a plurality of drives, the method comprising:
- for each of the nodes, collecting storage utilization data for a one account by;
  
  extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed;
  
  generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file;
  
  storing the raw storage data file on the given node;
  
  locating, by the given node periodically searching for, the raw storage data file on the given node;
  
  upon the raw storage data file being located;
  
  copying the raw storage data file to the cluster until the raw storage data file is successfully copied to the cluster;
  
  upon successfully copying the raw storage data file to the cluster, marking the raw storage data file as copied;
  
  uploading the raw storage data file to the controller until the raw storage data file is successfully uploaded to the controller; and
  
  upon successfully uploading the raw storage data file to the controller, deleting the raw storage data file.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein the distributed storage cluster further includes a plurality of proxy servers, each proxy server coupled to one or more nodes to coordinate access requests with its corresponding nodes, the method further comprising:
    - for each proxy server, collecting transfer utilization data for the one account by;
      
      creating, at a given proxy server, a rotated log file based on a log file being maintained by the given proxy server, wherein the log file contains entries for each access request processed by the given proxy server;
      
      locating, by the given proxy server periodically searching for, the rotated log file on the given proxy server;
      
      upon the rotated log file being located;
      
      renaming the rotated log file such that the rotated log file becomes a preprocess file;
      
      parsing the preprocess file on the given proxy server to extract raw transfer data, wherein the raw transfer data parsed from the preprocess file are aggregated into aggregated transfer data until the parsing completes;
      
      transferring, from the given proxy server, the raw transfer data to the cluster;
      
      generating, at the given proxy server, an aggregated transfer data file based on the aggregated transfer data;
      
      determining whether both transferring the raw data and generating the aggregated transfer data file are successful;
      
      if both transferring the raw data and generating the aggregated transfer data file are successful, uploading the aggregated transfer data file to the controller until the aggregated transfer data file is successfully uploaded to the controller; and
      
      upon successfully uploading the aggregated transfer data file to the controller, deleting the aggregated transfer data file.
  - 3. The method of claim 2, further comprising:
    - at the controller, generating a cluster account interval (CAI) data for the one account based on the raw storage data file and the aggregated transfer data file,wherein each CAI datum is derived from one record from each of the raw storage data file and the aggregated transfer data file, andwherein the CAI data include storage and transfer utilization data over a predetermined time span.
  - 4. The method of claim 3, further comprising:
    - at the controller, generating a cluster account interval offset (CAIO) data for the one account based on the raw storage data file and the aggregated transfer data file,wherein the CAIO data has the same predetermined time span as the CAI data but has a different starting time than the CAI data.
  - 5. The method of claim 4, wherein the predetermined time span is one (1) hour, and wherein the starting times for the CAI and CAIO data are offset by half (½
    - ) hour.
  - 6. The method of claim 4, wherein each record from the raw storage data file and the aggregated transfer data file only contributes to exactly one CAI datum and one CAIO datum.
  - 7. The method of claim 4, further comprising:
    - at the controller, tracking (1) an expected count of records in the raw storage data file and the aggregated transfer data file, and (2) an actual processed count of records in the raw storage data file and the aggregated transfer data file.
  - 8. The method of claim 7, further comprising:
    - generating a table tracking a percent complete value based on the tracking.
  - 9. The method of claim 2, further comprising:
    - upon successfully receiving the raw storage data file or the aggregated transfer data file, transmitting, by the controller, an acknowledgement for successful receipt.
  - 10. The method of claim 2, further comprising:
    - at the cluster, storing the raw transfer data transferred from the given proxy server as a backup file.
  - 11. The method of claim 2, wherein the renaming comprises appending a unique suffix to a file name of the rotated log file to create the preprocess file.
  - 12. The method of claim 2, wherein the all available preprocess files have the same prefix in their file names.
  - 13. The method of claim 2, wherein transferring the raw transfer data and parsing the preprocess file on the given proxy server are performed simultaneously.
  - 14. The method of claim 2, wherein parsing the preprocess file on the given proxy server is performed on a per-entry basis.
  - 15. The method of claim 2, wherein the raw transfer data include, for the one account, (1) bytes transferred in from the cluster, and (2) bytes transferred out to the cluster.
  - 16. The method of claim 15, wherein an in-memory aggregation data structure is maintained on a memory of the given proxy server to store the raw transfer data, the method further comprising:
    - maintaining the in-memory aggregation data structure for the one account by, for a given entry of the raw transfer data;
      
      incrementing a value representing bytes transferred in from the cluster by a first corresponding value parsed out from the given entry;
      
      incrementing a value representing bytes transferred out to the cluster by a second corresponding value parsed out from the given entry; and
      
      incrementing a value representing request count by one.
  - 17. The method of claim 2, wherein each entry in the raw transfer data represents one access request.
  - 18. The method of claim 2, wherein internal proxy sub-requests and unparsable lines in the preprocess files are ignored.
  - 19. The method of claim 1, wherein the storage utilization parameters include a container count, an object count, and bytes used values for the one account.

20. A method for collecting utilization data for a plurality of accounts in a distributed object storage cluster including a controller, a plurality of nodes, each node coupled to a plurality of drives, and a plurality of proxy servers, each proxy server coupled to one or nodes to coordinate access requests with its corresponding nodes, the method comprising:
- (1) for each of the nodes, collecting storage utilization data for a one account by;
  
  extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed;
  
  generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file;
  
  storing the raw storage data file on the given node;
  
  locating, by periodically searching for, the raw storage data file on the given node; and
  
  upon the raw storage data file being located, moving the raw storage data file to the controller;
  
  (2) for each proxy server, collecting transfer utilization data for the one account by;
  
  creating, at a given proxy server, a rotated log file based on a log file being maintained by the given proxy server, wherein the log file contains entries for each access request processed by the given proxy server;
  
  locating, by periodically searching for, the rotated log file on the given proxy server; and
  
  upon the rotated log file being located, generating and moving an aggregated transfer data file to the controller, wherein the aggregated transfer data file is generated based on the rotated log file; and
  
  (3) at the controller, generating a cluster account interval (CAI) data based on the raw storage data file and the aggregated transfer data file, wherein the CAI data include storage and transfer utilization data over a predetermined time span.
- View Dependent Claims (21)
- - 21. The method of claim 20, wherein moving the raw storage data file to the controller comprises:
    - copying the raw storage data file to the cluster until the raw storage data file is successfully copied to the cluster;
      
      upon successfully copying the raw storage data file to the cluster, marking the raw storage data file as copied;
      
      uploading the raw storage data file to the controller until the raw storage data file is successfully uploaded to the controller; and
      
      upon successfully uploading the raw storage data file to the controller, deleting the raw storage data file.

22. The method of 20, wherein generating and moving the aggregated transfer data file to the controller comprises:
- renaming the rotated log file such that the rotated log file becomes a preprocess file;
  
  parsing the preprocess file on the given proxy server to extract raw transfer data, wherein the raw transfer data parsed from the preprocess file are aggregated into aggregated transfer data until the parsing completes;
  
  transferring, from the given proxy server, the raw transfer data to the cluster;
  
  generating, at the given proxy server, an aggregated transfer data file based on the aggregated transfer data;
  
  determining whether both transferring the raw data and generating the aggregated transfer data file are successful;
  
  if both transferring the raw data and generating the aggregated transfer data file are successful, uploading the aggregated transfer data file to the controller until the aggregated transfer data file is successfully uploaded to the controller;
  
  upon successfully uploading the aggregated transfer data file to the controller, deleting the aggregated transfer data file.

23. A system for collecting utilization data for a plurality of accounts in a distributed storage cluster, the system comprising a controller and a plurality of nodes, each node coupled to a plurality of drives, wherein:
- each of the nodes includes one or more processors configured to collect storage utilization data for a one account by;
  
  extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed;
  
  generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file;
  
  storing the raw storage data file on the given node;
  
  locating, by the given node periodically searching for, the raw storage data file on the given node;
  
  upon the raw storage data file being located;
  
  copying the raw storage data file to the cluster until the raw storage data file is successfully copied to the cluster;
  
  upon successfully copying the raw storage data file to the cluster, marking the raw storage data file as copied;
  
  uploading the raw storage data file to the controller until the raw storage data file is successfully uploaded to the controller; and
  
  upon successfully uploading the raw storage data file to the controller, deleting the raw storage data file.
- View Dependent Claims (24, 25, 26)
- - 24. The system of claim 23, wherein the distributed storage cluster further includes a plurality of proxy servers, each proxy server coupled to one or more nodes to coordinate access requests with its corresponding nodes, wherein:
    - each proxy server includes one or more processors configured to collect transfer utilization data for the one account by;
      
      creating, at a given proxy server, a rotated log file based on a log file being maintained by the given proxy server, wherein the log file contains entries for each access request processed by the given proxy server;
      
      locating, by the given proxy server periodically searching for, the rotated log file on the given proxy server;
      
      upon the rotated log file being located;
      
      renaming the rotated log file such that the rotated log file becomes a preprocess file;
      
      parsing the preprocess file on the given proxy server to extract raw transfer data, wherein the raw transfer data parsed from the preprocess file are aggregated into aggregated transfer data until the parsing completes;
      
      transferring, from the given proxy server, the raw transfer data to the cluster;
      
      generating, at the given proxy server, an aggregated transfer data file based on the aggregated transfer data;
      
      determining whether both transferring the raw data and generating the aggregated transfer data file are successful;
      
      if both transferring the raw data and generating the aggregated transfer data file are successful, uploading the aggregated transfer data file to the controller until the aggregated transfer data file is successfully uploaded to the controller; and
      
      upon successfully uploading the aggregated transfer data file to the controller, deleting the aggregated transfer data file.
  - 25. The system of claim 24, wherein the controller includes one or more processors configured to generate a cluster account interval (CAI) data for the one account based on the raw storage data file and the aggregated transfer data file,wherein each CAI datum is derived from one record from each of the raw storage data file and the aggregated transfer data file, andwherein the CAI data include storage and transfer utilization data over a predetermined time span.
  - 26. The system of claim 24, further comprising:
    - deleting the preprocess file if the aggregated transfer data file is generated successfully.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NVIDIA Corporation
Original Assignee
SwiftStack Inc. (NVIDIA Corporation)
Inventors
Bishop, Darrell

Granted Patent

US 9,842,153 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 11/1471   involving logging of persis...

G06F 11/2094   Redundant storage or storag...

G06F 11/3006   where the computing system ...

G06F 11/3034   where the computing system ...

G06F 11/3452   Performance evaluation by s...

G06F 11/3476   Data logging G06F11/14, G06...

G06F 16/148   File search processing

G06F 16/182   Distributed file systems

G06F 16/27   Replication, distribution o...

H04L 12/14   Charging , metering or bill...

H04L 43/0876   Network utilisation, e.g. v...

USAGE AND BANDWIDTH UTILIZATION COLLECTION MECHANISM FOR A DISTRIBUTED STORAGE SYSTEM

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

USAGE AND BANDWIDTH UTILIZATION COLLECTION MECHANISM FOR A DISTRIBUTED STORAGE SYSTEM

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links