N-gram analysis of inputs to a software application
First Claim
1. A method, implemented at a distributed computer system that includes at least one computer processor, said method for analyzing an application based on n-gram sequences associated with inputs of said application, said method comprising:
- executing an application in a production environment that comprises a first computer system of said distributed computer system;
receiving first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment;
identifying, within said first tracer data, a first plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences;
identifying, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment;
based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application;
based on said one or more characteristics, configuring a test environment that comprises a second computer system of said distributed computer system;
executing said application in said test environment that includes said one or more identified characteristics;
receiving second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment;
identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences;
identifying a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and
comparing said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application.
2 Assignments
0 Petitions
Accused Products
Abstract
Input sequence information may be analyzed and quantified using n-gram analysis of inputs received by an application. The sequences of inputs may be represented by n-grams, and the frequency of the various n-grams may indicate the ‘real world’ uses of the application in production, which may be compared to a test suite whose coverage may be quantified using a similar n-gram analysis. A coverage factor may compare the observed inputs to the application in production to the test suite for the application. The n-grams may be further quantified or prioritized by resource utilization and several visualizations may be generated from the data.
41 Citations
34 Claims
-
1. A method, implemented at a distributed computer system that includes at least one computer processor, said method for analyzing an application based on n-gram sequences associated with inputs of said application, said method comprising:
-
executing an application in a production environment that comprises a first computer system of said distributed computer system; receiving first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment; identifying, within said first tracer data, a first plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences; identifying, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment; based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application; based on said one or more characteristics, configuring a test environment that comprises a second computer system of said distributed computer system; executing said application in said test environment that includes said one or more identified characteristics; receiving second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment; identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences; identifying a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and comparing said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23)
-
-
12. A distributed computer system comprising:
-
at least one processor; and one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the at least one processor to cause the distributed computer system to analyze an application based on n-gram sequences associated with inputs of the application, the computer-executable instructions including instructions that are executable to cause the distributed computer system to perform at least the following; execute an application in a production environment that comprises a first computer system of said distributed computer system; receive first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment; identify, within said first tracer data, a first a plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences; identify, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment; based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application; based on said one or more characteristics, configure a test environment that comprises a second computer system of said distributed computer system; execute said application in said test environment that includes said one or more identified characteristics; receive second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment; identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences; identify a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and compare said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
24. A computer program product comprising one or more computer-readable storage media having stored thereon one or more computer-executable instructions that are executable by one or more processors of a distributed computer system to cause the distributed computer system to analyze an application based on n-grams associated with inputs of the application, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following:
-
execute an application in a production environment that comprises a first computer system of said distributed computer system; receive first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment; identify, within said first tracer data, a first a plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences; identify, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment; based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application; based on said one or more characteristics, configure a test environment that comprises a second computer system of said distributed computer system; execute said application in said test environment that includes said one or more identified characteristics; receive second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment; identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences; identify a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and compare said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification