Software testing using machine learning

US 8,924,938 B2
Filed: 09/28/2007
Issued: 12/30/2014
Est. Priority Date: 09/28/2006
Status: Active Grant

First Claim

Patent Images

1. A method for analyzing a computer program, comprising:

Performing, using at least one processor, a static analysis on a program to determine property correctness;

generating and conducting test cases to provide test output data;

applying one or more learning methods for producing hypotheses about aspects of execution of the program to classify paths for test cases to determine whether the test cases have been encountered or otherwise, wherein each learning is a generalization from input data and the output of the learning methods inductively classifies if each test output trace is similar to a prior test case, and wherein path selection traverses program paths in a control flow graph (CFG) representation of the program and to select paths from the CFG along with constraints on the variables at different nodes that contradict a hypothesis given by one learning method;

performing an iterated exploration of one or more paths while testing each path for membership;

for each path segment (f,g) in a path, performing a depth first search of the CFG representation of the program to find all loop free paths lending from f to g while visiting no other functions;

determining a path summary consisting of a guard y and an update U and marking the path as feasible if the guard is feasible;

determining a set of path summaries σ

(π

) by iterated composition as follows
σ

(π

₀)=σ

_f1,g1, σ

(π

_m+1)=σ

(π

_m)∘

σ

_fm+1,gm+1;

wherein π

represents a call graph path; and

in accordance with the hypothesis, generating new test cases to cause the program to exercise behavior which is outside of the encountered test cases.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for analyzing a computer program includes performing a static analysis on a program to determine property correctness. Test cases are generated and conducted to provide test output data. Hypotheses about aspects of execution of the program are produced to classify paths for test cases to determine whether the test cases have been encountered or otherwise. In accordance with the hypothesis, new test cases are generated to cause the program to exercise behavior which is outside of the encountered test cases.

48 Citations

View as Search Results

20 Claims

1. A method for analyzing a computer program, comprising:
- Performing, using at least one processor, a static analysis on a program to determine property correctness;
  
  generating and conducting test cases to provide test output data;
  
  applying one or more learning methods for producing hypotheses about aspects of execution of the program to classify paths for test cases to determine whether the test cases have been encountered or otherwise, wherein each learning is a generalization from input data and the output of the learning methods inductively classifies if each test output trace is similar to a prior test case, and wherein path selection traverses program paths in a control flow graph (CFG) representation of the program and to select paths from the CFG along with constraints on the variables at different nodes that contradict a hypothesis given by one learning method;
  
  performing an iterated exploration of one or more paths while testing each path for membership;
  
  for each path segment (f,g) in a path, performing a depth first search of the CFG representation of the program to find all loop free paths lending from f to g while visiting no other functions;
  
  determining a path summary consisting of a guard y and an update U and marking the path as feasible if the guard is feasible;
  
  determining a set of path summaries σ
  
  (π
  
  ) by iterated composition as follows
  σ
  
  (π
  
  ₀)=σ
  
  _f1,g1, σ
  
  (π
  
  _m+1)=σ
  
  (π
  
  _m)∘
  
  σ
  
  _fm+1,gm+1;
  
  wherein π
  
  represents a call graph path; and
  
  in accordance with the hypothesis, generating new test cases to cause the program to exercise behavior which is outside of the encountered test cases.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method as recited in claim 1, wherein performing a static analysis includes instrumenting the program to track and check program features.
  - 3. The method as recited in claim 1, wherein generating new test cases includes selecting paths in the program that contradict the hypotheses.
  - 4. The method as recited in claim 3, wherein selecting paths includes selecting constraints in the program that contradict the hypotheses.
  - 5. The method as recited in claim 3, further comprising constraint solving selected paths to determine feasibility of the selected paths and generate test cases to exercise feasible paths.
  - 6. The method as recited in claim 1, wherein generating and conducting test cases to provide test output data includes simulating the program using the test cases to provide test output data.
  - 7. The method as recited in claim 1, wherein producing hypotheses about aspects of execution of the program includes providing a combination of learning modules to generate the hypotheses.
  - 8. The method as recited in claim 7, further comprising isolating parts of the output test data relevant to each learning module.
  - 9. The method as recited in claim 7, wherein the test output data includes traces and further comprising employing NGRAMS to represent traces.
  - 10. The method as recited in claim 9, further comprising controlling generalization of the hypotheses for traces by controlling a length of the NGRAMS.

11. A non-transitory computer readable storage device comprising a computer readable program, wherein the computer readable program when executed on a computer causes the computer to perform the steps of:
- performing a static analysis on a program to determine property correctness;
  
  generating and conducting test cases to provide test output data;
  
  applying one or more learning methods for producing hypotheses about aspects of execution of the program to classify paths for test cases to determine whether the test cases have been encountered or otherwise, wherein each learning is a generalization from input data and the output of the learning methods inductively classifies if each test output trace is similar to a prior test case, and wherein path selection traverses program paths in a control flow graph (CFG) representation of the program and to select paths from the CFG along with constraints on the variables at different nodes that contradict a hypothesis given by one learning method;
  
  performing an iterated exploration of one or more paths while testing each path for membership;
  
  for each path segment (f,g) in a path, performing a depth first search of the CFG representation of the program to find all loop free paths lending from f to g while visiting no other functions;
  
  determining a path summary consisting of a guard y and an update U and marking the path as feasible if the guard is feasible;
  
  determining a set of path summaries σ
  
  (π
  
  ) by iterated composition as follows
  σ
  
  (π
  
  ₀)=σ
  
  _f1,g1, σ
  
  (π
  
  _m+1)=σ
  
  (π
  
  _m)∘
  
  σ
  
  _fm+1,gm+1;
  
  wherein π
  
  represents a call graph path; and
  
  in accordance with the hypothesis, generating new test cases to cause the program to exercise behavior which is outside of the encountered test cases.

12. A system for analyzing a computer program, comprising:
- at least one processor;
  
  a static analysis code executed by a processor and configured to instrument and analyze a program to determine property correctness;
  
  a test generator to generate test cases on an instrumented version of the program and to provide test output data;
  
  a learning system configured to produce hypotheses about aspects of execution of the program to classify paths for the test cases to determine whether the test cases have been encountered or otherwise, wherein each learning is a generalization from input data and the output of the learning methods inductively classifies if each test output trace is similar to a prior test case, and wherein path selection traverses program paths in a control flow graph (CFG) representation of the program and to select paths from the CFG along with constraints on the variables at different nodes that contradict a hypothesis given by one learning method, wherein the learning system performs an iterated exploration of one or more paths while testing each path for membership, and for each path segment (f,g) in a path, performs a depth first search of the CFG representation of the program to find all loop free paths lending from f to g while visiting no other functions;
  
  determines a path summary consisting of a guard y and an update U and marking the path as feasible if the guard is feasible; and
  
  determines a set of path summaries σ
  
  (π
  
  ) by iterated composition as follows
  σ
  
  (π
  
  ₀)=σ
  
  _f1,g1, σ
  
  (π
  
  _m+1)=σ
  
  (π
  
  _m)∘
  
  σ
  
  _fm+1,gm+1;
  
  wherein π
  
  represents a call graph path; and
  
  a feedback loop coupled from the learning system to the test generator to provide hypotheses to the test generator to generate new test cases to cause the program to exercise behavior which outside of the encountered test cases.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The system as recited in claim 12, wherein the test generator includes a path selector configured to selecting paths in the program that contradict the hypotheses.
  - 14. The system as recited in claim 13, further comprising a constraint solver configured to resolve constraints for selected paths in the program that contradict the hypotheses.
  - 15. The system as recited in claim 13, wherein the constraint solver determines feasibility of the selected paths and generates test cases to exercise feasible paths.
  - 16. The system as recited in claim 12, further comprising a simulator to execute at least portions of the program to provide test output data using the test cases.
  - 17. The system as recited in claim 12, wherein the learning system includes a combination of learning modules to generate the hypotheses.
  - 18. The system as recited in claim 17, further comprising a projection for each learning module configured to isolate parts of the output test data relevant to each respective learning module.
  - 19. The system as recited in claim 17, wherein the test output data includes traces represented by NGRAMS.
  - 20. The system as recited in claim 19, wherein the NGRAMS include a controlled length such that by controlling the length a generalization of the hypotheses for traces is controlled.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Laboratories America Inc (NEC Corporation)
Inventors
Chang, Richard, Sankaranarayanan, Sriram, Jiang, Guofei, Ivancic, Franjo
Primary Examiner(s)
Pan, Hang

Application Number

US11/863,387
Publication Number

US 20080082968A1
Time in Patent Office

2,650 Days
Field of Search

717/128, 717/126
US Class Current

717/128
CPC Class Codes

G06F 11/3688 for test execution, e.g. sc...

Software testing using machine learning

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

48 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Software testing using machine learning

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

48 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links