Computer-implemented system and method for detecting anomalies using sample-based rule identification
First Claim
Patent Images
1. A system for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising:
- a non-transitory computer readable storage medium comprising program code and further comprising;
a database comprising a data set for data analytics, the data set comprising a plurality of data points; and
a set of anomaly rules;
a computer processor and memory with the computer processor coupled to the storage medium, wherein the computer processor is configured to execute the program code to perform steps to;
statistically identify one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points;
label each of the identified data points as at least one of anomaly and non-anomaly based on verification by a domain expert;
adjust the set of anomaly rules comprised in the database based on at least one of the labeled anomalies, comprising creating an additional anomaly rule and adding the rule to the set, further comprising;
determine an entropy of at least a portion of a different data set, the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly;
use the entropy to set a threshold; and
set the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold;
detect and classify as the one or more additional anomalies the one or more data points other than the at least one labeled anomaly comprised in the database by applying the adjusted set of anomaly rues comprised in the database to the statistics for the data points; and
control manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies.
6 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for detecting anomalies using sample-based rule identification is provided. Data for data is maintained analytics in a database. A set of anomaly rules is defined. A rare pattern in the data is statistically identified. The identified rare pattern is labeled as at least one of anomaly and non-anomaly based on verification by a domain expert. The set of anomaly rules is adjusted based on the labeled anomaly. Other anomalies in the data are detected and classified by applying the adjusted set of anomaly rules to the data.
34 Citations
13 Claims
-
1. A system for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising:
-
a non-transitory computer readable storage medium comprising program code and further comprising; a database comprising a data set for data analytics, the data set comprising a plurality of data points; and a set of anomaly rules; a computer processor and memory with the computer processor coupled to the storage medium, wherein the computer processor is configured to execute the program code to perform steps to; statistically identify one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; label each of the identified data points as at least one of anomaly and non-anomaly based on verification by a domain expert; adjust the set of anomaly rules comprised in the database based on at least one of the labeled anomalies, comprising creating an additional anomaly rule and adding the rule to the set, further comprising; determine an entropy of at least a portion of a different data set, the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; use the entropy to set a threshold; and set the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detect and classify as the one or more additional anomalies the one or more data points other than the at least one labeled anomaly comprised in the database by applying the adjusted set of anomaly rues comprised in the database to the statistics for the data points; and control manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising the steps of:
-
maintaining a data set for data analytics comprised in a storage medium, the data set comprising a plurality of data points; statistically identifying with a computer processor and memory with the computer processor coupled to the non-transitory computer readable storage medium one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; labeling each of the identified data points with the computer processor as at least one of anomaly and non-anomaly based on verification by a domain expert; defining a set of anomaly rules comprised in the storage medium based on at least one of the labeled anomalies, comprising creating one of the anomaly rules and adding the rue to the set, further comprising; determining an entropy of at least a portion of a different data set, the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; using the entropy to set a threshold; and setting the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detecting and classifying as the one or more additional anomalies the one or more data points other than the at least one labeled anomaly in the data comprised in the database with the computer processor by applying the set of anomaly rules to the statistics for the data points; and controlling manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies.
-
-
8. A method for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising the steps of:
-
maintaining a data set for data analytics in a database comprised in a non-transitory computer readable storage medium, the data set comprising a plurality of data points; defining a set of anomaly rules comprised in the database comprised in the storage medium; statistically identifying with a computer processor and memory with the computer processor coupled to the non-transitory computer readable storage medium one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; labeling the identified data points with the computer processor as at least one of anomaly and non-anomaly based on verification by a domain expert; adjusting the set of anomaly rules comprised in the database with the computer processor based on the labeled anomalies, comprising creating an additional anomaly rule and adding the additional rule to the set, further comprising; determining an entropy of at least a portion of a different data set the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; using the entropy to set a threshold; and setting the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detecting and classifying as the one or more additional anomalies the data points other than the at least one labeled anomaly comprised in the database with the computer processor by applying the adjusted set of anomaly rules comprised in the database to the data set statistics for the data points; and controlling manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies. - View Dependent Claims (9, 10, 11, 12, 13)
-
Specification