Duplicate bug report detection using machine learning algorithms and automated feedback incorporation
First Claim
1. A method comprising:
- for each particular set of bug reports, in a first plurality of sets of bug reports, identifying;
(a) a user-classification of the particular set of bug reports as including duplicate bug reports or non-duplicate bug reports;
(b) a first plurality of correlation values, each of which corresponds to a respective feature, of a plurality of features, between bug reports in the particular set of bug reports;
based on (a) and (b), for the first plurality of sets of bug reports, generating a model to identify any set of bug reports as including duplicate bug reports or non-duplicate bug reports;
receiving a request to determine whether a particular bug report is a duplicate of any of a second plurality of bug reports;
identifying a first category associated with the particular bug report;
identifying a first subset of bug reports, of the second plurality of bug reports, associated with the first category;
identifying a second subset of bug reports, of the second plurality of bug reports, that have been previously identified as a duplicate of at least one bug report of the first subset of bug reports;
identifying a set of candidate bug reports that;
(a) includes one or more of the first subset of bug reports;
(b) includes one or more of the second subset of bug reports; and
(c) does not include a third subset of bug reports, of the second plurality of bug reports, that (i) are not associated with the first category and (ii) have not been previously identified as a duplicate of any bug report of the first subset of bug reports;
applying the model to obtain a classification of the particular bug report and a candidate bug report, of the set of candidate bug reports, as duplicate bug reports or non-duplicate bug reports, and refraining from applying the model to classify the particular bug report and any of the third subset of bug reports as duplicate bug reports or non-duplicate bug reports.
1 Assignment
0 Petitions
Accused Products
Abstract
Duplicate bug report detection using machine learning algorithms and automated feedback incorporation is disclosed. For each set of bug reports, a user-classification of the set of bug reports as including duplicate bug reports or non-duplicate bug reports is identified. Also for each set of bug reports, correlation values corresponding to a respective feature, of a plurality of features, between bug reports in the set of bug reports is identified. Based on the user-classifications and the correlation values, a model is generated to identify any set of bug reports as including duplicate bug reports or non-duplicate bug reports. The model is applied to classify a particular bug report and a candidate bug report as duplicate bug reports or non-duplicate bug reports.
15 Citations
20 Claims
-
1. A method comprising:
-
for each particular set of bug reports, in a first plurality of sets of bug reports, identifying; (a) a user-classification of the particular set of bug reports as including duplicate bug reports or non-duplicate bug reports; (b) a first plurality of correlation values, each of which corresponds to a respective feature, of a plurality of features, between bug reports in the particular set of bug reports; based on (a) and (b), for the first plurality of sets of bug reports, generating a model to identify any set of bug reports as including duplicate bug reports or non-duplicate bug reports; receiving a request to determine whether a particular bug report is a duplicate of any of a second plurality of bug reports; identifying a first category associated with the particular bug report; identifying a first subset of bug reports, of the second plurality of bug reports, associated with the first category; identifying a second subset of bug reports, of the second plurality of bug reports, that have been previously identified as a duplicate of at least one bug report of the first subset of bug reports; identifying a set of candidate bug reports that; (a) includes one or more of the first subset of bug reports; (b) includes one or more of the second subset of bug reports; and (c) does not include a third subset of bug reports, of the second plurality of bug reports, that (i) are not associated with the first category and (ii) have not been previously identified as a duplicate of any bug report of the first subset of bug reports; applying the model to obtain a classification of the particular bug report and a candidate bug report, of the set of candidate bug reports, as duplicate bug reports or non-duplicate bug reports, and refraining from applying the model to classify the particular bug report and any of the third subset of bug reports as duplicate bug reports or non-duplicate bug reports. - View Dependent Claims (2)
-
-
3. A non-transitory computer readable medium comprising instructions which, when executed by one or more hardware processors, causes performance of operations comprising:
-
for each particular set of bug reports, in a first plurality of sets of bug reports, identifying; (a) a user-classification of the particular set of bug reports as including duplicate bug reports or non-duplicate bug reports; (b) a first plurality of correlation values, each of which corresponds to a respective feature, of a plurality of features, between bug reports in the particular set of bug reports; based on (a) and (b), for the first plurality of sets of bug reports, generating a model to identify any set of bug reports as including duplicate bug reports or non-duplicate bug reports; receiving a request to determine whether a particular bug report is a duplicate of any of a second plurality of bug reports; identifying a first category associated with the particular bug report; identifying a first subset of bug reports, of the second plurality of bug reports, associated with the first category; identifying a second subset of bug reports, of the second plurality of bug reports, that have been previously identified as a duplicate of at least one bug report of the first subset of bug reports; identifying a set of candidate bug reports that; (a) includes one or more of the first subset of bug reports; (b) includes one or more of the second subset of bug reports; and (c) does not include a third subset of bug reports, of the second plurality of bug reports, that (i) are not associated with the first category and (ii) have not been previously identified as a duplicate of any bug report of the first subset of bug reports; applying the model to obtain a classification of the particular bug report and a candidate bug report, of the set of candidate bug reports, as duplicate bug reports or non-duplicate bug reports, and refraining from applying the model to classify the particular bug report and any of the third subset of bug reports as duplicate bug reports or non-duplicate bug reports. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
- at least one hardware device including a processor; and
the system configured to perform operations comprising;for each particular set of bug reports, in a first plurality of sets of bug reports, identifying; (a) a user-classification of the particular set of bug reports as including duplicate bug reports or non-duplicate bug reports; (b) a first plurality of correlation values, each of which corresponds to a respective feature, of a plurality of features, between bug reports in the particular set of bug reports; based on (a) and (b), for the first plurality of sets of bug reports, generating a model to identify any set of bug reports as including duplicate bug reports or non-duplicate bug reports; receiving a request to determine whether a particular bug report is a duplicate of any of a second plurality of bug reports; identifying a first category associated with the particular bug report;
identifying a first subset of bug reports, of the second plurality of bug reports, associated with the first category;identifying a second subset of bug reports, of the second plurality of bug reports, that have been previously identified as a duplicate of at least one bug report of the first subset of bug reports; identifying a set of candidate bug reports that; (a) includes one or more of the first subset of bug reports; (b) includes one or more of the second subset of bug reports; and (c) does not include a third subset of bug reports, of the second plurality of bug reports, that (i) are not associated with the first category and (ii) have not been previously identified as a duplicate any bug report of the first subset of bug reports; applying the model to obtain a classification of the particular bug report and a candidate bug report, of the set of candidate bug reports, as duplicate bug reports or nonduplicate bug reports, and refraining from applying the model to classify the particular bug report and any of the third subset of bug reports as duplicate bug reports or non-duplicate bug reports.
- at least one hardware device including a processor; and
Specification