Method and system for adaptively imputing sparse and missing data for predictive models
First Claim
Patent Images
1. A method for imputing data for a learning system, comprising:
- collecting data from a monitored target system;
determining one or more levels of missingness for the data collected from the monitored target system;
selecting, from among a plurality of imputation techniques, a selected imputation technique based at least in part upon the one or more levels of missingness for the data, wherein expectation maximization (EM) is selected as the selected imputation technique if it is determined that both an overall level of missing data and individual levels of missing data for signals are at one or more designated thresholds, and an external data source is accessed to generate an EM seed for the expectation maximization when insufficient seed data exists within the data collected from the monitored target system;
imputing missing data using the selected imputation technique to generate training data; and
performing model training with the training data.
5 Assignments
0 Petitions
Accused Products
Abstract
Described is an approach that provides an adaptive solution to missing data for machine learning systems. A gradient solution is provided that is attentive to imputation needs at each of several missingness levels. This multilevel approach treats data missingness at any of multiple severity levels while utilizing, as much as possible, the actual observed data.
-
Citations
15 Claims
-
1. A method for imputing data for a learning system, comprising:
-
collecting data from a monitored target system; determining one or more levels of missingness for the data collected from the monitored target system; selecting, from among a plurality of imputation techniques, a selected imputation technique based at least in part upon the one or more levels of missingness for the data, wherein expectation maximization (EM) is selected as the selected imputation technique if it is determined that both an overall level of missing data and individual levels of missing data for signals are at one or more designated thresholds, and an external data source is accessed to generate an EM seed for the expectation maximization when insufficient seed data exists within the data collected from the monitored target system; imputing missing data using the selected imputation technique to generate training data; and performing model training with the training data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for imputing data for a machine learning system, comprising:
-
a processor; a memory for holding programmable code; and wherein the programmable code includes instructions for collecting data from a monitored target system;
determining one or more levels of missingness for the data collected from the monitored target system;
selecting, from among a plurality of imputation techniques, a selected imputation technique based at least in part upon the one or more levels of missingness for the data, wherein expectation maximization (EM) is selected as the selected imputation technique if it is determined that both an overall level of missing data and individual levels of missing data for signals are at one or more designated thresholds, and an external data source is accessed to generate an EM seed for the expectation maximization when insufficient seed data exists within the data collected from the monitored target system;
imputing missing data using the selected imputation technique to generate training data; and
performing model training with the training data. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer program product embodied on a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when executed by a processor, executes a method comprising:
-
collecting data from a monitored target system; determining one or more levels of missingness for the data collected from the monitored target system; selecting, from among a plurality of imputation techniques, a selected imputation technique based at least in part upon the one or more levels of missingness for the data, wherein expectation maximization (EM) is selected as the selected imputation technique if it is determined that both an overall level of missing data and individual levels of missing data for signals are at one or more designated thresholds, and an external data source is accessed to generate an EM seed for the expectation maximization when insufficient seed data exists within the data collected from the monitored target system; imputing missing data using the selected imputation technique to generate training data; and performing model training with the training data. - View Dependent Claims (12, 13, 14, 15)
-
Specification