N-gram analysis of inputs to a software application

US 9,880,915 B2
Filed: 03/05/2014
Issued: 01/30/2018
Est. Priority Date: 03/05/2014
Status: Active Grant

First Claim

Patent Images

1. A method, implemented at a distributed computer system that includes at least one computer processor, said method for analyzing an application based on n-gram sequences associated with inputs of said application, said method comprising:

executing an application in a production environment that comprises a first computer system of said distributed computer system;

receiving first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment;

identifying, within said first tracer data, a first plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences;

identifying, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment;

based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application;

based on said one or more characteristics, configuring a test environment that comprises a second computer system of said distributed computer system;

executing said application in said test environment that includes said one or more identified characteristics;

receiving second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment;

identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences;

identifying a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and

comparing said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Input sequence information may be analyzed and quantified using n-gram analysis of inputs received by an application. The sequences of inputs may be represented by n-grams, and the frequency of the various n-grams may indicate the ‘real world’ uses of the application in production, which may be compared to a test suite whose coverage may be quantified using a similar n-gram analysis. A coverage factor may compare the observed inputs to the application in production to the test suite for the application. The n-grams may be further quantified or prioritized by resource utilization and several visualizations may be generated from the data.

41 Citations

View as Search Results

34 Claims

1. A method, implemented at a distributed computer system that includes at least one computer processor, said method for analyzing an application based on n-gram sequences associated with inputs of said application, said method comprising:
- executing an application in a production environment that comprises a first computer system of said distributed computer system;
  
  receiving first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment;
  
  identifying, within said first tracer data, a first plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences;
  
  identifying, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment;
  
  based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application;
  
  based on said one or more characteristics, configuring a test environment that comprises a second computer system of said distributed computer system;
  
  executing said application in said test environment that includes said one or more identified characteristics;
  
  receiving second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment;
  
  identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences;
  
  identifying a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and
  
  comparing said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23)
- - 2. The method of claim 1, wherein at least one of the plurality of n-grams comprise a tri-gram of input parameters.
  - 3. The method of claim 1, wherein at least one of the plurality of n-grams comprise a 4-gram of said input parameters.
  - 4. The method of claim 1, further comprising:
    - presenting a graph including at least a subset of the intercepted data.
  - 5. The method of claim 4, wherein the graph comprises a histogram.
  - 6. The method of claim 5, wherein the histogram includes one or more sequences of input parameters.
  - 7. The method of claim 6, wherein the one or more sequences of input parameters are associated with an entirety of the application during execution of the application in the production environment.
  - 8. The method of claim 6, wherein the one or more sequences of input parameters are associated with one or more functions called during execution of the application in the production environment.
  - 9. The method of claim 6, wherein the histogram also includes sequences of one or more function calls associated with the one or more sequences of input parameters.
  - 10. The method of claim 9, wherein the histogram also includes at least one of memory usage, central processing unit usage, and network usage.
  - 11. The method of claim 10, wherein the intercepted data includes at least one of input parameters, functions called, memory usage, central processing unit usage, and network usage.
  - 23. The method of claim 1, wherein the data associated with the first plurality of inputs is intercepted prior to the application receiving the first plurality of inputs.

12. A distributed computer system comprising:
- at least one processor; and
  
  one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the at least one processor to cause the distributed computer system to analyze an application based on n-gram sequences associated with inputs of the application, the computer-executable instructions including instructions that are executable to cause the distributed computer system to perform at least the following;
  
  execute an application in a production environment that comprises a first computer system of said distributed computer system;
  
  receive first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment;
  
  identify, within said first tracer data, a first a plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences;
  
  identify, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment;
  
  based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application;
  
  based on said one or more characteristics, configure a test environment that comprises a second computer system of said distributed computer system;
  
  execute said application in said test environment that includes said one or more identified characteristics;
  
  receive second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment;
  
  identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences;
  
  identify a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and
  
  compare said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The computer system of claim 12, wherein at least one of the plurality of n-grams comprise a tri-gram of inputs.
  - 14. The computer system of claim 12, wherein at least one of the plurality of n-grams comprise a 4-gram of said inputs.
  - 15. The computer system of claim 12, wherein a graph comprising at least a subset of the intercepted data is presented.
  - 16. The computer system of claim 15, wherein the graph comprises a histogram.
  - 17. The computer system of claim 16, wherein the histogram includes one or more sequences of input parameters.
  - 18. The computer system of claim 17, wherein the one or more sequences of input parameters are associated with an entirety of the application during execution of the application in the production environment.
  - 19. The computer system of claim 17, wherein the one or more sequences of input parameters are associated with one or more functions called during execution of the application in the production environment.
  - 20. The computer system of claim 12, wherein the interception of data associated with the first plurality of inputs to the application is performed by a tracer.
  - 21. The computer system of claim 12, wherein the interception of data associated with the first plurality of inputs to the application is performed by a monitoring agent.
  - 22. The computer system of claim 21, wherein the monitoring agent resides on a first device that is different than a second device on which the application executes.

24. A computer program product comprising one or more computer-readable storage media having stored thereon one or more computer-executable instructions that are executable by one or more processors of a distributed computer system to cause the distributed computer system to analyze an application based on n-grams associated with inputs of the application, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following:
- execute an application in a production environment that comprises a first computer system of said distributed computer system;
  
  receive first tracer data observed from execution of said application in said production environment, said first tracer data observed from execution of said application in said production environment comprising a first plurality of inputs provided to said application during execution in said production environment;
  
  identify, within said first tracer data, a first a plurality of n-gram sequences of said first plurality of inputs, each of said first plurality of n-gram sequences comprising at least one of a first plurality of input parameter sequences;
  
  identify, from a usage frequency database comprising usage data for each of said first plurality of n-gram sequences from said first tracer data, one or more ways in which said application was used during execution in said production environment;
  
  based on said one or more ways in which said application was used during execution in said production environment, identifying one or more characteristics of a test environment for said application;
  
  based on said one or more characteristics, configure a test environment that comprises a second computer system of said distributed computer system;
  
  execute said application in said test environment that includes said one or more identified characteristics;
  
  receive second tracer data observed from execution of said application during execution in said test environment, said second tracer data observed from execution of said application in said test environment comprising a second plurality of inputs provided to said application during execution in said test environment;
  
  identifying, within said second tracer data, a second plurality of n-gram sequences of said second plurality of inputs, each of said second plurality of n-gram sequences comprising at least one of a second plurality of input parameter sequences;
  
  identify a subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences of said second tracer data; and
  
  compare said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database, wherein comparing comprises mapping said subset of said first plurality of n-gram sequences contained in said second plurality of n-gram sequences to said usage frequency database to thereby determine a test coverage factor of said application.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 25. The computer program product of claim 24, wherein at least one of the plurality of n-grams comprise a tri-gram of inputs.
  - 26. The computer program product of claim 24, where at least one of the plurality of n-grams comprise a 4-gram of said inputs.
  - 27. The computer program product of claim 24, wherein a graph comprising at least a subset of the intercepted data is presented.
  - 28. The computer program product of claim 27, wherein the graph comprises a histogram.
  - 29. The computer program product of claim 28, wherein the histogram includes one or more sequences of input parameters.
  - 30. The computer program product of claim 29, wherein the one or more sequences of input parameters are associated with an entirety of the application during execution of the application in the production environment.
  - 31. The computer program product of claim 29, wherein the one or more sequences of input parameters are associated with one or more functions called during execution of the application in the production environment.
  - 32. The computer program product of claim 24, wherein the interception of data associated with the first plurality of inputs to the application is performed by a tracer.
  - 33. The computer program product of claim 24, wherein the interception of data associated with the first plurality of inputs to the application is performed by a monitoring agent.
  - 34. The computer program product of claim 33, wherein the monitoring agent resides on a first device that is different than a second device on which the application executes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Baril, Bryce B., Gounares, Alexander G., Krajec, Russell S.
Primary Examiner(s)
Bloss, Stephanie

Application Number

US14/198,239
Publication Number

US 20150254151A1
Time in Patent Office

1,427 Days
Field of Search

702182
US Class Current
CPC Class Codes

G06F 11/3003   Monitoring arrangements spe...

G06F 11/3082   the data filtering being ac...

G06F 21/577   Assessing vulnerabilities a...

G06F 2201/865   Monitoring of software

N-gram analysis of inputs to a software application

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

N-gram analysis of inputs to a software application

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links