Purging of stored timeseries data

US 8,001,093 B2
Filed: 04/03/2008
Issued: 08/16/2011
Est. Priority Date: 11/22/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for purging timeseries data samples stored in a repository, said method comprising:

calculating, by a computer, a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements,wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node,wherein said event timeseries has a dependency relationship to said measurement timeseries, andwherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries;

changing, by said computer, said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries;

determining, by said computer, an information content and an average utility value of each said data sample in each of said selected windows,wherein said determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data samples; and

purging, by said computer, said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized,wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is disclosed methods, systems and computer program products for purging stored data in a repository. Users attach relative importance to all data samples across all timeseries in a repository. The importance attached to a data sample is the ‘utility value’ of the data sample. An algorithm uses the utility of data samples and allocates the storage space of the repository in such a way that the total loss of information due to purging is minimized while preserving samples with a high utility value.

Citations

20 Claims

1. A computer-implemented method for purging timeseries data samples stored in a repository, said method comprising:
- calculating, by a computer, a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements,wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node,wherein said event timeseries has a dependency relationship to said measurement timeseries, andwherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries;
  
  changing, by said computer, said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries;
  
  determining, by said computer, an information content and an average utility value of each said data sample in each of said selected windows,wherein said determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data samples; and
  
  purging, by said computer, said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized,wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries.
- View Dependent Claims (2, 3, 4, 5, 11, 14)
- - 2. The method of claim 1, wherein said purging also ensures that a maximum capacity of said repository is not exceeded.
  - 3. The method of claim 1, wherein said utility value calculation is based on regions of said data samples of interest.
  - 4. The method of claim 1, wherein said utility value calculation is based on age of said data samples.
  - 5. The method of claim 4, wherein said age is determined by a polynomic function.
  - 11. The method of claim 1, wherein said event timeseries denotes a Boolean value associated with either a normal or a problem state.
  - 14. The method of claim 1, wherein said information content is calculated by one of a variance, an entropy, and a histogram of values of said data samples.

6. A computer system for purging timeseries data samples stored in a repository, said system comprising:
- a repository that stores measurement timeseries and event timeseries data samples; and
  
  a processor configured to;
  
  calculate a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements,wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node,wherein said event timeseries has a dependency relationship to said measurement timeseries, andwherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries;
  
  change said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries;
  
  determine an information content and an average utility value of each said data sample in each of said selected windows,wherein determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data sample; and
  
  purge said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized,wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries.
- View Dependent Claims (7, 8, 9, 12, 15)
- - 7. The system of claim 6, wherein said purging also ensures that a maximum capacity of said repository is not exceeded.
  - 8. The system of claim 6, wherein said utility value calculation is based on regions of said data samples of interest.
  - 9. The system of claim 6, wherein said utility value calculation is based on age of said data samples.
  - 12. The system of claim 6, wherein said event timeseries denotes a Boolean value associated with either a normal or a problem state.
  - 15. The system of claim 6, wherein said information content is calculated by one of a variance, an entropy, and a histogram of values of said data samples.

10. A non-transitory computer program storage medium, readable by a computer, tangibly embodying a computer program of instructions executable by said computer to perform a method for purging timeseries data samples stored in a repository, said method comprising:
- calculating a utility value for each data sample of a measurement timeseries and an event timeseries, respectively, of monitored IT infrastructure elements,wherein a calculation of said utility values is based on a dependency model being represented as a hierarchical graph having (i) nodes, and (ii) edges among nodes of a same level of said hierarchical graph, and each said measurement and event timeseries being associated with a node,wherein said event timeseries has a dependency relationship to said measurement timeseries, andwherein said measurement and event timeseries are segmented into time windows of a fixed size and window boundaries are synchronized for said measurement and event timeseries;
  
  changing, by said computer, said utility values of all data samples assigned to selected windows in said measurement and event timeseries within a temporal distance of an event that occurs in said event timeseries;
  
  determining an information content and an average utility value of each said data sample in each of said selected windows,wherein said determining of said information content of each said data sample is performed on the basis of using one of a probability distribution function, a mean square error, and a Kullback Liebler distance applied to said data samples; and
  
  purging said data samples from each of said selected windows that are to be stored in said repository of fixed size, such that said data samples having high utility value are retained and loss of said information content of retained data samples is minimized,wherein said number of data samples in said selected windows to be stored in said repository is proportional to a product of said average utility and said information content of said data samples in said selected windows, divided by a sum of products of an average utility and an information content for all windows of said measurement and event timeseries.
- View Dependent Claims (13, 16, 17, 18, 19, 20)
- - 13. The method of claim 10, wherein said event timeseries denotes a Boolean value associated with either a normal or a problem state.
  - 16. The method of claim 10, wherein said event timeseries denotes a Boolean value associated with either a normal or a problem state.
  - 17. The method of claim 10, wherein said purging also ensures that a maximum capacity of said repository is not exceeded.
  - 18. The method of claim 10, wherein said utility value calculation is based on regions of said data samples of interest.
  - 19. The method of claim 10, wherein said utility value calculation is based on age of said data samples.
  - 20. The method of claim 19, wherein said age is determined by a polynomic function.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Domo, Inc.
Original Assignee
International Business Machines Corporation
Inventors
Neogi, Anindya, Kothari, Ravi, Singh, Raghavendra
Primary Examiner(s)
Ali; Mohammad
Assistant Examiner(s)
Tran; Bao G

Application Number

US12/061,730
Publication Number

US 20080183778A1
Time in Patent Office

1,230 Days
Field of Search

707/999.001, 707/999.003, 707/999.1, 707/692, 707/755, 707/756, 706/12, 706/47, 706/58, 713/500
US Class Current

707/692
CPC Class Codes

G06F 16/22 Indexing; Data structures t...

Purging of stored timeseries data

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Purging of stored timeseries data

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links