Auto-tuning program analysis tools using machine learning

US 20160182558A1
Filed: 01/25/2016
Published: 06/23/2016
Est. Priority Date: 12/18/2014
Status: Active Grant

First Claim

Patent Images

1. A method to reduce false alarms generated by an automated analysis tool performing static security analysis on a software system, comprising:

with respect to each of one or more particular findings in a set of data, automatically generating a classification for the particular finding, wherein the classification is based at least in part on a characteristic associated with the particular finding;

based on the automatically-generated classifications for the particular findings, computing a machine learning classifier using software executing in a hardware element;

applying the machine learning classifier to a set of data representing findings generated by the static security analysis.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Machine learning (ML) significantly reduces false alarms generated by an automated analysis tool performing static security analysis. Using either user-supplied or system-generated annotation of particular findings, a “hypothesis” is generated about how to classify other static analysis findings. The hypothesis is implemented as a machine learning classifier. To generate the classifier, a set of features are abstracted from a typical witness, and the system compares feature sets against one another to determine a set of weights for the classifier. The initial hypothesis is then validated against a second set of findings, and the classifier is adjusted as necessary based on how close it fits the new data. Once the approach converges on a final classifier, it is used to filter remaining findings in the report.

Citations

21 Claims

1. A method to reduce false alarms generated by an automated analysis tool performing static security analysis on a software system, comprising:
- with respect to each of one or more particular findings in a set of data, automatically generating a classification for the particular finding, wherein the classification is based at least in part on a characteristic associated with the particular finding;
  
  based on the automatically-generated classifications for the particular findings, computing a machine learning classifier using software executing in a hardware element;
  
  applying the machine learning classifier to a set of data representing findings generated by the static security analysis.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as described in claim 1 wherein computing the machine learning classifier includes:
    - identifying a set of features common to each finding;
      
      assigned weights to each of the set of features; and
      
      based on the assigned weights, computing a weighting function having a threshold value that determines a correctness of a new finding.
  - 3. The method as described in claim 2 wherein the weights are assigned by applying a regression analysis over the findings in the subset of the data according to the user-generated classifications.
  - 4. The method as described in claim 2 wherein the set of features common to each finding include one of:
    - witness length, source type, sink type, witness type, conditional statements, method calls and string operations.
  - 5. The method as described in claim 1 wherein the characteristic is one of:
    - that the particular finding is also present in data reported by a bug tracking system, that the particular finding was present in a prior version of the software system, and that the particular finding has a structural similarity to a finding that has an existing classification.
  - 6. The method as described in claim 1 wherein the particular findings are a subset of the findings generated by the static security analysis.
  - 7. The method as described in claim 1 further including supplementing the automatically-generated classifications with data representing user-generated classifications for at least some of the particular findings.

8. Apparatus, comprising:
- a processor;
  
  computer memory holding computer program instructions executed by the processor to reduce false alarms generated by an automated analysis tool performing static security analysis on a software system, the computer program instructions operative to;
  
  with respect to each of one or more particular findings in a set of data, automatically generate a classification for the particular finding, wherein the classification is based at least in part on a characteristic associated with the particular finding;
  
  based on the automatically-generated classifications for the particular findings, compute a machine learning classifier; and
  
  apply the machine learning classifier to a set of data representing findings generated by the static security analysis.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus as described in claim 8 wherein the computer program instructions operative to compute the machine learning classifier includes program code further operative to:
    - identify a set of features common to each finding;
      
      assign weights to each of the set of features; and
      
      based on the assigned weights, compute a weighting function having a threshold value that determines a correctness of a new finding.
  - 10. The apparatus as described in claim 9 wherein the weights are assigned by applying a regression analysis over the findings in the subset of the data according to the user-generated classifications.
  - 11. The apparatus as described in claim 9 wherein the set of features common to each finding include one of:
    - witness length, source type, sink type, witness type, conditional statements, method calls and string operations.
  - 12. The apparatus as described in claim 8 wherein the characteristic is one of:
    - that the particular finding is also present in data reported by a bug tracking system, that the particular finding was present in a prior version of the software system, and that the particular finding has a structural similarity to a finding that has an existing classification.
  - 13. The apparatus as described in claim 8 wherein the particular findings are a subset of the findings generated by the static security analysis.
  - 14. The apparatus as described in claim 8 wherein the computer program instructions are further operative to supplement the automatically-generated classifications with data representing user-generated classifications for at least some of the particular findings.

15. A computer program product in a non-transitory computer readable medium for use in a data processing system, the computer program product holding computer program instructions executed by the data processing system to reduce false alarms generated by an automated analysis tool performing static security analysis on a software system, the computer program instructions operative to:
- with respect to each of one or more particular findings in a set of data, automatically generate a classification for the particular finding, wherein the classification is based at least in part on a characteristic associated with the particular finding;
  
  based on the automatically-generated classifications for the particular findings, compute a machine learning classifier; and
  
  apply the machine learning classifier to a set of data representing findings generated by the static security analysis.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer program product as described in claim 15 wherein the computer program instructions operative to compute the machine learning classifier includes program code further operative to:
    - identify a set of features common to each finding;
      
      assign weights to each of the set of features; and
      
      based on the assigned weights, compute a weighting function having a threshold value that determines a correctness of a new finding.
  - 17. The computer program product as described in claim 16 wherein the weights are assigned by applying a regression analysis over the findings in the subset of the data according to the user-generated classifications.
  - 18. The computer program product as described in claim 16 wherein the set of features common to each finding include one of:
    - witness length, source type, sink type, witness type, conditional statements, method calls and string operations.
  - 19. The computer program product as described in claim 15 wherein the characteristic is one of:
    - that the particular finding is also present in data reported by a bug tracking system, that the particular finding was present in a prior version of the software system, and that the particular finding has a structural similarity to a finding that has an existing classification.
  - 20. The computer program product as described in claim 15 wherein the particular findings are a subset of the findings generated by the static security analysis.
  - 21. The computer program product as described in claim 15 wherein the computer program instructions are further operative to supplement the automatically-generated classifications with data representing user-generated classifications for at least some of the particular findings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Tripp, Omer

Granted Patent

US 10,135,856 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 21/562   Static detection

G06F 2221/2101   Auditing as a secondary aspect

G06N 20/00   Machine learning

G06N 3/02   Neural networks

H04L 63/1433   Vulnerability analysis

H04L 63/1483   service impersonation, e.g....

Auto-tuning program analysis tools using machine learning

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Auto-tuning program analysis tools using machine learning

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links