Security alerting using n-gram analysis of program execution data
First Claim
1. A method performed on at least one computer processor, said method comprising:
- receiving first tracer data observed from a first execution of an application, said first tracer data comprising first production input data representing inputs that were provided to said application during said first execution of said application;
identifying a plurality of n-grams representing unique input data sequences within said first tracer data;
generating a set of usage statistics comprising one usage statistic for each of said plurality of n-grams;
storing said set of usage statistics and said n-grams in a database;
receiving second tracer data observed from a second execution of said application, said second tracer data comprising second production input data representing inputs that were provided to said application during said second execution of said application;
identifying a first n-gram within said second tracer data, said first n-gram representing a first unique input data sequence within said second tracer data;
comparing said first n-gram to said database to determine a first usage statistic for said first n-gram from the set of usage statistics; and
determining said first usage statistic for said first n-gram is below a predefined threshold and determining that said first n-gram represents behavior anomalous to said first tracer data.
2 Assignments
0 Petitions
Accused Products
Abstract
N-grams of input streams or functions executed by an application may be analyzed to identify security breaches or other anomalous behavior. A histogram of n-grams representing sequences of executed functions or input streams may be generated through baseline testing or production use. An alerting system may compare real time n-gram observations to the histogram of n-grams to identify security breaches or other changes in application behavior that may be anomalous. An alert may be generated that identifies the anomalous behavior. The alerting system may be trained using known good datasets and may identify deviations as bad behavior. The alerting system may be trained using known bad datasets and may identify matching behavior as bad behavior.
-
Citations
32 Claims
-
1. A method performed on at least one computer processor, said method comprising:
-
receiving first tracer data observed from a first execution of an application, said first tracer data comprising first production input data representing inputs that were provided to said application during said first execution of said application; identifying a plurality of n-grams representing unique input data sequences within said first tracer data; generating a set of usage statistics comprising one usage statistic for each of said plurality of n-grams; storing said set of usage statistics and said n-grams in a database; receiving second tracer data observed from a second execution of said application, said second tracer data comprising second production input data representing inputs that were provided to said application during said second execution of said application; identifying a first n-gram within said second tracer data, said first n-gram representing a first unique input data sequence within said second tracer data; comparing said first n-gram to said database to determine a first usage statistic for said first n-gram from the set of usage statistics; and determining said first usage statistic for said first n-gram is below a predefined threshold and determining that said first n-gram represents behavior anomalous to said first tracer data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
one or more hardware processors; a database comprising a plurality of n-grams representing unique input data sequences observed in a first tracer data from a first execution of an application, said first tracer data comprising first production input data representing inputs that were provided to said application during said first execution of said application, and a set of usage statistics comprising one usage statistic for each of said plurality of n-grams; and one or more computer-readable media having stored thereon computer-executable instructions that are executable by the one or more hardware processors to implement an analysis engine that is configured to perform at least the following; receive second tracer data observed from a second execution of said application, said second tracer data comprising second production input data representing inputs that were provided to said application during said second execution of said application; identify a first n-gram within said second tracer data, said first n-gram representing a first unique input data sequence within said second tracer data; compare said first n-gram to said database to determine a first usage statistic for said first n-gram from the set of usage statistics; and determine said first usage statistic for said first n-gram is below a predefined threshold and determines that said first n-gram represents behavior anomalous to said first tracer data. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer program product comprising one or more hardware storage devices having stored thereon computer-executable instructions that are executable by one or more processors of a computer system to configure the computer system to perform at least the following:
-
receive first tracer data observed from a first execution of an application, said first tracer data comprising first production input data representing inputs that were provided to said application during said first execution of said application; identify a plurality of n-grams representing unique input data sequences within said first tracer data; generate a set of usage statistics comprising one usage statistic for each of said plurality of n-grams; store said set of usage statistics and said n-grams in a database; receive second tracer data observed from a second execution of said application, said second tracer data comprising second production input data representing inputs that were provided to said application during said second execution of said application; identify a first n-gram within said second tracer data, said first n-gram representing a first unique input data sequence within said second tracer data; compare said first n-gram to said database to determine a first usage statistic for said first n-gram from the set of usage statistics; and determine said first usage statistic for said first n-gram is below a predefined threshold and determining that said first n-gram represents behavior anomalous to said first tracer data. - View Dependent Claims (32)
-
Specification