Computer method for identifying a misclassified software object in a cluster of internally similar software objects
First Claim
1. A method for use in a programmable computer system for identifying software objects that have been assigned to a wrong group, said group being intended to represent a respective cluster of internally similar software objects, wherein the similarity between objects is determined, such as by evaluating a similarity function, and wherein the input comprises a set of software objects, assigned to various groups, peer parameter K, and confidence parameter N, said method comprising the computer-implemented steps of:
- (a) ascertaining the similarity between each pair of objects, such as by computing a similarity function such as Feature Ratio With Linking;
(b) for each object O,(b.1) sorting O'"'"'s neighbors, nearest first,(b.2) examining O'"'"'s neighbors in order, counting how many of them are assigned to one or another group, until K are found that are assigned to the same group, recording the group name, say G, and the number of neighbors examined, say E,(b.3) if G is the group to which O is currently assigned, marking O as being correctly classified with confidence E-K and skipping to step (c), and(b.4) otherwise, continuing examining the neighbors in order until K have been found that are assigned to the same module as O, or until all neighbors have been examined, recording the number of neighbors examined, say F, marking O as being misclassified, with confidence F-K, and as likely belonging to group G with confidence E-K;
(c) sorting the misclassified objects according to their mis-classification confidence, greatest first (here "greater" corresponds to "worse"), and outputing the list, reporting for each object the current group assignment, the mis-classification confidence, the group that the object likely belong to, and the confidence with which it likely belongs; and
(d) sorting the objects that are correctly classified but with confidence greater than N (here "greater" corresponds to "worse"), sorting by confidence, greatest first, and outputing the sorted list, reporting for each object the confidence with which it belongs to the module to which it is currently assigned, whereby the likelihood of misclassification of objects is ascertainable by the respective confidence level.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying software objects that have been assigned to a wrong group, in which the similarity between objects is known, such as by evaluating a similarity function, comprises the steps of checking each object to see whether it belongs to its current group with K peers and confidence N, checking whether each object belongs to another group with a lower and therefore better confidence rating, and identifying as misclassified those objects having a lower confidence rating in said another group.
91 Citations
5 Claims
-
1. A method for use in a programmable computer system for identifying software objects that have been assigned to a wrong group, said group being intended to represent a respective cluster of internally similar software objects, wherein the similarity between objects is determined, such as by evaluating a similarity function, and wherein the input comprises a set of software objects, assigned to various groups, peer parameter K, and confidence parameter N, said method comprising the computer-implemented steps of:
-
(a) ascertaining the similarity between each pair of objects, such as by computing a similarity function such as Feature Ratio With Linking; (b) for each object O, (b.1) sorting O'"'"'s neighbors, nearest first, (b.2) examining O'"'"'s neighbors in order, counting how many of them are assigned to one or another group, until K are found that are assigned to the same group, recording the group name, say G, and the number of neighbors examined, say E, (b.3) if G is the group to which O is currently assigned, marking O as being correctly classified with confidence E-K and skipping to step (c), and (b.4) otherwise, continuing examining the neighbors in order until K have been found that are assigned to the same module as O, or until all neighbors have been examined, recording the number of neighbors examined, say F, marking O as being misclassified, with confidence F-K, and as likely belonging to group G with confidence E-K; (c) sorting the misclassified objects according to their mis-classification confidence, greatest first (here "greater" corresponds to "worse"), and outputing the list, reporting for each object the current group assignment, the mis-classification confidence, the group that the object likely belong to, and the confidence with which it likely belongs; and (d) sorting the objects that are correctly classified but with confidence greater than N (here "greater" corresponds to "worse"), sorting by confidence, greatest first, and outputing the sorted list, reporting for each object the confidence with which it belongs to the module to which it is currently assigned, whereby the likelihood of misclassification of objects is ascertainable by the respective confidence level. - View Dependent Claims (4)
-
-
2. A method for use in a programmable computer system for identifying software objects that have been assigned to a wrong group by sorting the misclassified objects according to their confidence ratings, wherein the similarity between objects is determined, such as by evaluating a similarity function, and wherein the input comprises a set of software objects, assigned to various groups, peer parameter K, and confidence parameter N, said method comprising the computer-implemented steps of:
-
(a) ascertaining the similarity between each pair of objects, such as by computing a similarity function such as Feature Ratio With Linking; (b) for each object O, (b.1) sorting O'"'"'s neighbors, nearest first, (b.2) examining O'"'"'s neighbors in order, counting how many of them are assigned to one or another group, until K are found that are assigned to the same group, recording the group name, say G, and the number of neighbors examined, say E, (b.3) if G is the group to which O is currently assigned, marking O as being correctly classified with confidence and skipping to step (c), and (b.4) otherwise, continuing examining the neighbors in order until K have been found that are assigned to the same module as O, or until all neighbors have been examined, recording the number of neighbors examined, say F, marking O as being misclassified, with confidence F-K, and as likely belonging to group G with confidence E-K; and (c) sorting the misclassified objects according to their confidence ratings.
-
-
3. A method for use in a programmable computer system for identifying misclassified software objects that have been assigned to a wrong group and sorting misclassified objects according to an object'"'"'s similarity to its nearest bad neighbor, wherein the similarity between objects is determined, such as by evaluating a similarity function, and wherein the input comprises a set of software objects, assigned to various groups, peer parameter K, and confidence parameter N, said method comprising the computer-implemented steps of:
-
(a) ascertaining the similarity between each pair of objects, such as by computing a similarity function such as Feature Ratio With Linking; (b) for each object O, (b.1) sorting O'"'"'s neighbors, nearest first, (b.2) examining O'"'"'s neighbors in order, counting how many of them are assigned to one or another group, until K are found that are assigned to the same group, recording the group name, say G, and the number of neighbors examined, say E, (b.3) if G is the group to which O is currently assigned, marking O as being correctly classified with confidence E-K and skipping to step (c), and (b.4) otherwise, continuing examining the neighbors in order until K have been found that are assigned to the same module as O, or until all neighbors have been examined, recording the number of neighbors examined, say F, marking O as being misclassified, with confidence F-K, and as likely belonging to group G with confidence E-K; and (c) sorting the misclassified objects by assigning a priority to a misclassified object according to its similarity to its nearest bad neighbor and an output list is sorted by priority.
-
-
5. A method for use in a programmable computer system for identifying software objects that have been assigned to a wrong group by outputing a sorted list reporting for each object the confidence with which it belongs to a module to which it is currently assigned, and wherein the input comprises a set of software objects, assigned to various groups, peer parameter K, and confidence parameter N, said software objects comprising the static declaration units of a program and having non-local identifiers that designate them, a coefficient k controlling how important the invoker-invokee relationship is in computing similarity, relative to the importance of having common features, a coefficient d controlling how sensitive the measure is to distinctive features, a coefficient n controlling how sensitive similarity is to the total weight of the common features, said method comprising the computer-implemented steps of:
-
(a) determining the similarity between each pair of objects, whereof typical first and second software objects, hereinafter referred to as "A" and "B", being declared to be within said system, coefficients for the similarity function being in this case designated "k", "n", and "d"; and
bias multipliers being designated for predetermined features, each of said bias multiplier comprising a feature name and a positive number;(b) applying a conventional cross-reference extractor to identify all of the software objects declared in said system, to generate a unique name for each non-local identifier, and to locate each occurence of a non-local identifier; (c) for each occurrence of a non-local identifier, determining the unique name of the identifier, herein referred to as "Y", and the unique name of the object wherein it occurs, herein designated "X" and assigning to "X" the feature "uses-Y", and assigning to "Y", if it is a software object, the feature "used-by-X" and if one of X and Y already had the feature just assigned to it, not duplicating these feature assignments; (d) to each feature named in step (c), herein designated "f", assigning a weight Wf=-log (probability (f)); (d) for each bias multiplier specified in the input, recomputing the weight of that feature by multiplying its Shannon information content by the specified multiplier; (f) comparing the features of objects A and B, and dividing them into three sets, a first set being A∩
B the features that both A and B have, a second set being the features that A-B has and B does not, and a third set being the features that B has and A does not B-A;(g) computing the sums of the weights of the features in each of said three sets, denote these, (F(A∩
B), F(A-B), and F(B-A), respectively;(h) computing the similarity of A and B by a monotonic, matching function, which must also satisfy the constraint that if the set is empty, and neither object uses the name of the other object, the similarity is 0; (i) determining the similarity between each pair of objects, such as by computing a similarity function such as Feature Ratio With Linking; (j) for each object O, (j.1) sorting O'"'"'s neighbors, nearest first, (j.2) examining O'"'"'s neighbors in order, counting how many of them are assigned to one or another group, until K are found that are assigned to the same group, recording the group name, say G, and the number of neighbors examined, say E, (j.3) if G is the group to which O is currently assigned, marking O as being correctly classified with confidence E-K and skipping to step (k), and (j.4) otherwise, continuing examining the neighbors in order until K have been found that are assigned to the same module as O, or until all neighbors have been examined, recording the number of neighbors examined, say F, marking O as being misclassified, with confidence, and as likely belonging to group G with confidence; (k) sorting the misclassified objects according to their mis-classification confidence, greatest first, and outputing the list, reporting for each object the current group assignment, the mis-classification confidence, the group that the object likely belong to, and the confidence with which it likely belongs; (l) sorting the objects that are correctly classified but with confidence greater than N (here "greater" corresponds to "worse"), sorting by confidence, greatest first, and outputing the sorted list, reporting for each object the confidence with which it belongs to the module to which it is currently assigned.
-
Specification