Purging of stored timeseries data
First Claim
Patent Images
1. A computer-implemented method for purging timeseries data samples stored in a repository, said method comprising:
- calculating, by a computer, a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements,wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node,wherein said event timeseries has a dependency relationship to said measurement timeseries, andwherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries;
changing, by said computer, said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries;
determining, by said computer, an information content and an average utility value of each said data sample in each of said selected windows,wherein said determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data samples; and
purging, by said computer, said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized,wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries.
3 Assignments
0 Petitions
Accused Products
Abstract
There is disclosed methods, systems and computer program products for purging stored data in a repository. Users attach relative importance to all data samples across all timeseries in a repository. The importance attached to a data sample is the ‘utility value’ of the data sample. An algorithm uses the utility of data samples and allocates the storage space of the repository in such a way that the total loss of information due to purging is minimized while preserving samples with a high utility value.
-
Citations
20 Claims
-
1. A computer-implemented method for purging timeseries data samples stored in a repository, said method comprising:
-
calculating, by a computer, a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements, wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node, wherein said event timeseries has a dependency relationship to said measurement timeseries, and wherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries; changing, by said computer, said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries; determining, by said computer, an information content and an average utility value of each said data sample in each of said selected windows, wherein said determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data samples; and purging, by said computer, said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized, wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries. - View Dependent Claims (2, 3, 4, 5, 11, 14)
-
-
6. A computer system for purging timeseries data samples stored in a repository, said system comprising:
-
a repository that stores measurement timeseries and event timeseries data samples; and a processor configured to; calculate a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements, wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node, wherein said event timeseries has a dependency relationship to said measurement timeseries, and wherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries; change said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries; determine an information content and an average utility value of each said data sample in each of said selected windows, wherein determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data sample; and purge said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized, wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries. - View Dependent Claims (7, 8, 9, 12, 15)
-
-
10. A non-transitory computer program storage medium, readable by a computer, tangibly embodying a computer program of instructions executable by said computer to perform a method for purging timeseries data samples stored in a repository, said method comprising:
-
calculating a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements, wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node, wherein said event timeseries has a dependency relationship to said measurement timeseries, and wherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries; changing, by said computer, said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries; determining an information content and an average utility value of each said data sample in each of said selected windows, wherein said determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data samples; and purging said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized, wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries. - View Dependent Claims (13, 16, 17, 18, 19, 20)
-
Specification