Software tool for detecting plagiarism in computer source code
First Claim
1. A computer-implemented method comprising:
- creating a first array based on a first program source code file including a plurality of program elements, the first array having entries corresponding to lines of functional program code from the first program source code file;
creating a second array based on a second program source code file including a plurality of program elements, the second array having entries corresponding to lines of functional program code from the second program source code file;
comparing the first array with the second array to find a longest sequence of similar entries;
calculating a match score based on a number of lines in the longest sequence; and
providing an indication of copying with respect to the first program source code file and the second program source code file, wherein the indication of copying is defined by the match score.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for detecting plagiarism of software source code. In one embodiment, a first set of arrays and a second set of arrays are created for a first program source code file and a second program source code file respectively. Each pair of arrays in the first and second sets has entries corresponding to program elements of a distinct program element type such as functional program code, program comments, or program code identifiers. Next, each pair of arrays from the first and second sets is compared to find similar entries, and an intermediate match score is calculated for each pair of arrays based on the similar entries. Further, the resulting intermediate match scores are combined to produce a combined match score, which is then used to provide an indication of copying with respect to the first program source code file and the second program source code file.
24 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
creating a first array based on a first program source code file including a plurality of program elements, the first array having entries corresponding to lines of functional program code from the first program source code file; creating a second array based on a second program source code file including a plurality of program elements, the second array having entries corresponding to lines of functional program code from the second program source code file; comparing the first array with the second array to find a longest sequence of similar entries; calculating a match score based on a number of lines in the longest sequence; and providing an indication of copying with respect to the first program source code file and the second program source code file, wherein the indication of copying is defined by the match score. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method comprising:
-
creating a first set of arrays based on a first program source code file including a plurality of program elements, each of the arrays of the first set of arrays having entries corresponding to program elements of a distinct program element type represented by at least one of functional program code, program comments, and program code identifiers; creating a second set of arrays for a second program source code file including a plurality of program elements, the second set of arrays having entries of one or more program element types corresponding to program element types of entries in the first set of arrays; comparing the arrays of the first set with the arrays of the second set to find similar entries; calculating a plurality of intermediate match scores based on the similar entries, each of the plurality of intermediate match scores being calculated for a pair of corresponding arrays from the first and second sets; combining the plurality of intermediate match scores to produce a combined match score; and providing an indication of copying with respect to the first program source code file and the second program source code file, wherein the indication of copying is defined by the combined match score. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable storage medium storing executable instructions to cause a computer system to perform a method comprising:
-
creating a first set of arrays based on a first program source code file including a plurality of program elements, each of the arrays of the first set of arrays having entries corresponding to program elements of a distinct program element type represented by at least one of functional program code, program comments, and program code identifiers; creating a second set of arrays for a second program source code file including a plurality of program elements, the second set of arrays having entries of one or more program element types corresponding to program element types of entries in the first set of arrays; comparing the arrays of the first set with the arrays of the second set to find similar entries; calculating a plurality of intermediate match scores based on the similar entries, each of the plurality of intermediate match scores being calculated for a pair of corresponding arrays from the first and second sets; combining the plurality of intermediate match scores to produce a combined match score; and providing an indication of copying with respect to the first program source code file and the second program source code file, wherein the indication of copying is defined by the combined match score. - View Dependent Claims (17, 18, 19, 20)
-
Specification