USAGE AND BANDWIDTH UTILIZATION COLLECTION MECHANISM FOR A DISTRIBUTED STORAGE SYSTEM
First Claim
1. A method for collecting utilization data for a plurality of accounts in a distributed storage cluster including a controller and a plurality of nodes, each node coupled to a plurality of drives, the method comprising:
- for each of the nodes, collecting storage utilization data for a one account by;
extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed;
generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file;
storing the raw storage data file on the given node;
locating, by the given node periodically searching for, the raw storage data file on the given node;
upon the raw storage data file being located;
copying the raw storage data file to the cluster until the raw storage data file is successfully copied to the cluster;
upon successfully copying the raw storage data file to the cluster, marking the raw storage data file as copied;
uploading the raw storage data file to the controller until the raw storage data file is successfully uploaded to the controller; and
upon successfully uploading the raw storage data file to the controller, deleting the raw storage data file.
2 Assignments
0 Petitions
Accused Products
Abstract
A technique is introduced that enables one or more mechanisms to collect storage and transfer utilization metrics for an account in a distributed data storage system that are more scalable and robust than conventional ways. The technique includes a method comprising, for each of the nodes, of collecting storage utilization data for a one account. The method further includes, for each proxy server, collecting transfer utilization data for the one account. The method further includes, at the controller, generating a cluster account interval (CAI) data based on a raw storage data file and an aggregated transfer data file. The CAI data include storage and transfer utilization data over a predetermined time span.
-
Citations
26 Claims
-
1. A method for collecting utilization data for a plurality of accounts in a distributed storage cluster including a controller and a plurality of nodes, each node coupled to a plurality of drives, the method comprising:
for each of the nodes, collecting storage utilization data for a one account by; extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed; generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file; storing the raw storage data file on the given node; locating, by the given node periodically searching for, the raw storage data file on the given node; upon the raw storage data file being located; copying the raw storage data file to the cluster until the raw storage data file is successfully copied to the cluster; upon successfully copying the raw storage data file to the cluster, marking the raw storage data file as copied; uploading the raw storage data file to the controller until the raw storage data file is successfully uploaded to the controller; and upon successfully uploading the raw storage data file to the controller, deleting the raw storage data file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
20. A method for collecting utilization data for a plurality of accounts in a distributed object storage cluster including a controller, a plurality of nodes, each node coupled to a plurality of drives, and a plurality of proxy servers, each proxy server coupled to one or nodes to coordinate access requests with its corresponding nodes, the method comprising:
-
(1) for each of the nodes, collecting storage utilization data for a one account by; extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed; generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file; storing the raw storage data file on the given node; locating, by periodically searching for, the raw storage data file on the given node; and upon the raw storage data file being located, moving the raw storage data file to the controller; (2) for each proxy server, collecting transfer utilization data for the one account by; creating, at a given proxy server, a rotated log file based on a log file being maintained by the given proxy server, wherein the log file contains entries for each access request processed by the given proxy server; locating, by periodically searching for, the rotated log file on the given proxy server; and upon the rotated log file being located, generating and moving an aggregated transfer data file to the controller, wherein the aggregated transfer data file is generated based on the rotated log file; and (3) at the controller, generating a cluster account interval (CAI) data based on the raw storage data file and the aggregated transfer data file, wherein the CAI data include storage and transfer utilization data over a predetermined time span. - View Dependent Claims (21)
-
-
22. The method of 20, wherein generating and moving the aggregated transfer data file to the controller comprises:
-
renaming the rotated log file such that the rotated log file becomes a preprocess file; parsing the preprocess file on the given proxy server to extract raw transfer data, wherein the raw transfer data parsed from the preprocess file are aggregated into aggregated transfer data until the parsing completes; transferring, from the given proxy server, the raw transfer data to the cluster; generating, at the given proxy server, an aggregated transfer data file based on the aggregated transfer data; determining whether both transferring the raw data and generating the aggregated transfer data file are successful; if both transferring the raw data and generating the aggregated transfer data file are successful, uploading the aggregated transfer data file to the controller until the aggregated transfer data file is successfully uploaded to the controller; upon successfully uploading the aggregated transfer data file to the controller, deleting the aggregated transfer data file.
-
-
23. A system for collecting utilization data for a plurality of accounts in a distributed storage cluster, the system comprising a controller and a plurality of nodes, each node coupled to a plurality of drives, wherein:
each of the nodes includes one or more processors configured to collect storage utilization data for a one account by; extracting, at a given node, a plurality of storage utilization parameters recorded under the one account from each of a plurality of account databases that are maintained by a plurality of drives coupled to the given node until all drives coupled to the given node are processed; generating a raw storage data file based on the plurality of storage utilization parameters extracted at the given node to the raw storage data file; storing the raw storage data file on the given node; locating, by the given node periodically searching for, the raw storage data file on the given node; upon the raw storage data file being located; copying the raw storage data file to the cluster until the raw storage data file is successfully copied to the cluster; upon successfully copying the raw storage data file to the cluster, marking the raw storage data file as copied; uploading the raw storage data file to the controller until the raw storage data file is successfully uploaded to the controller; and upon successfully uploading the raw storage data file to the controller, deleting the raw storage data file. - View Dependent Claims (24, 25, 26)
Specification