Tool for predicting fault-prone software files

US 8,151,146 B2
Filed: 06/11/2008
Issued: 04/03/2012
Est. Priority Date: 06/11/2008
Status: Active Grant

First Claim

Patent Images

1. A method of identifying software texts likely to include faults, the method comprising:

calculating, by a computing device, values of numerical coefficients included in a first set of equations representing linear combinations, the first set of equations including values of feature vectors associated with a first set of software texts, the first set of equations including values of fault counts associated with the first set of software texts;

calculating, by the computing device, values of fault counts using a second set of equations representing linear combinations, the calculated values of fault counts being associated with a second set of texts, the second set of equations including the calculated values of the numerical coefficients included in the first set of equations and values of feature vectors associated with the second set of texts; and

ranking, by the computing device, the second set of software texts based on the values of fault counts associated with the second set of equations.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, apparatus, and computer-readable medium for predicting the fault-proneness of code units (files, modules, packages, and the like) of large-scale, long-lived software systems. The method collects information about the code units and the development process from previous releases, and formats this information for input to an analysis stage. The tool then performs a statistical regression analysis on the collected data, and formulates a model to predict fault counts for code units of the current and future releases. Finally, the method computes an expected fault count for each code unit in the current release by applying the formulated model to data from the current release. The expected fault counts are used to rank the release units in descending order of fault-proneness so that debugging efforts and resources can be optimized.

19 Citations

View as Search Results

20 Claims

1. A method of identifying software texts likely to include faults, the method comprising:
- calculating, by a computing device, values of numerical coefficients included in a first set of equations representing linear combinations, the first set of equations including values of feature vectors associated with a first set of software texts, the first set of equations including values of fault counts associated with the first set of software texts;
  
  calculating, by the computing device, values of fault counts using a second set of equations representing linear combinations, the calculated values of fault counts being associated with a second set of texts, the second set of equations including the calculated values of the numerical coefficients included in the first set of equations and values of feature vectors associated with the second set of texts; and
  
  ranking, by the computing device, the second set of software texts based on the values of fault counts associated with the second set of equations.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method defined by claim 1, wherein the values of the numerical coefficients included in the first set of equations are calculated using a statistical regression model.
  - 3. The method defined by claim 2, wherein the statistical regression model includes at least one of a negative binomial regression model and a Poisson model.
  - 4. The method defined by claim 1, wherein at least one of the first set of software texts is a prior version of at least one of the second set of software texts.
  - 5. The method defined by claim 1, wherein at least one of the first and second sets of software texts is at least one of written in a computer programming language and be an electronically stored text file.
  - 6. The method defined by claim 1, wherein the feature vectors include at least one of the programming language in which a software text is written, a quantity of defects found in a version of a software text associated with a prior release, a quantity of changes made to a software text during at least one prior release, a quantity of successive prior releases for which a software text had a prior version, a fraction of a duration of a release for which a version of a software text existed, and a quantity derived from any predictive features using at least one of scaling, quantization, linear combination, and application of transcendental functions.
  - 7. The method defined by claim 1, further comprising:
    - grouping the first set of software texts by time into an ordered series of at least one release; and
      
      assigning the second set of software texts to a separate, most recent, release in the ordered series to define the feature vectors.

8. An apparatus to identify software texts likely to include faults comprising a computing device, the computing device configured to:
- calculate values of numeric coefficients included in a first set of equations representing linear combinations, the first set of equations including values of feature vectors associated with a first set of software texts, the first set of equations including values of fault counts associated with the first set of software texts;
  
  calculate values of fault counts using a second set of equations representing linear combinations, the calculated values of fault counts being associated with a second set of texts, the second set of equations including the calculated values of the numerical coefficients included in the first set of equations and values of feature vectors associated with the second set of texts; and
  
  rank the second set of software texts based on the values of fault counts associated with the second set of equations.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus defined by claim 8, wherein the values of numeric coefficients included in the first set of equations are calculated using a statistical regression model.
  - 10. The apparatus defined by claim 9, wherein the statistical regression model includes at least one of a negative binomial regression model and a Poisson model.
  - 11. The apparatus defined by claim 8, wherein at least one of the first set of software texts is a prior version of at least one of the second set of software texts.
  - 12. The apparatus defined by claim 8, wherein at least one of the first and second sets of software texts is at least one of written in a computer programming language and be an electronically stored text file.
  - 13. The apparatus defined by claim 8, wherein the feature vectors include at least one of the programming language in which a software text is written, a quantity of defects found in a version of a software text associated with a prior release, a quantity of changes made to a software text during at least one prior release, a quantity of successive prior releases for which a software text had a prior version, a fraction of a duration of a release for which a version of a software text existed, and a quantity derived from any predictive features using at least one of scaling, quantization, linear combination, and application of transcendental functions.
  - 14. The apparatus defined by claim 8, wherein the computing device is further configured to:
    - group the first set of software texts by time into an ordered series of at least one release; and
      
      assign the second set of software texts to a separate, most recent, release in the ordered series to define the feature vectors.

15. A non-transitory computer-readable medium comprising instructions that, when executed by a computing device, cause the computing device to:
- calculate values of numerical coefficients included in a first set of equations representing linear combinations, the first set of equations including values of feature vectors associated with a first set of software texts, the first set of equations including values of fault counts associated with the first set of software texts;
  
  calculate values of fault counts using a second set of equations representing linear combinations, the calculated values of fault counts being associated with a second set of texts, the second set of equations including the calculated values of the numerical coefficients included in the first set of equations and values of feature vectors associated with the second set of texts; and
  
  rank the second set of software texts based on the values of fault counts associated with the second set of equations.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable medium defined by claim 15, wherein the values of the numerical coefficients included in the first set of equations are calculated using a statistical regression model.
  - 17. The non-transitory computer-readable medium defined by claim 16, wherein the statistical regression model includes at least one of a negative binomial regression model and a Poisson model.
  - 18. The non-transitory computer-readable medium defined by claim 15, wherein at least one of the first set of software texts is a prior version of at least one of the second set of software texts.
  - 19. The non-transitory computer-readable medium defined by claim 15, wherein at least one of the first and second sets of software texts is at least one of written in a computer programming language and be an electronically stored text file.
  - 20. The non-transitory computer-readable medium defined by claim 15, wherein the feature vectors include at least one of the programming language in which a software text is written, a quantity of defects found in a version of a software text associated with a prior release, a quantity of changes made to a software text during at least one prior release, a quantity of successive prior releases for which a software text had a prior version, a fraction of a duration of a release for which a version of a software text existed, and a quantity derived from any predictive features using at least one of scaling, quantization, linear combination, and application of transcendental functions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Labs Incorporated (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Bell, Robert, Gauld, Andrew, Weyuker, Elaine, Ostrand, Thomas
Primary Examiner(s)
Baderman, Scott
Assistant Examiner(s)
Leibovich, Yair

Application Number

US12/137,282
Publication Number

US 20090313605A1
Time in Patent Office

1,392 Days
Field of Search

714/25, 714/47.1, 717/124, 717/140, 717/143
US Class Current

714/47.1
CPC Class Codes

G06F 11/008 Reliability or availability...

G06F 11/3616 using software metrics

Tool for predicting fault-prone software files

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Tool for predicting fault-prone software files

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links