SYSTEM AND METHOD FOR PERFORMING CODE PROVENANCE REVIEW IN A SOFTWARE DUE DILIGENCE SYSTEM
First Claim
1. A method for performing code provenance review in a software due diligence system, comprising:
- receiving source code subject to code provenance review;
retrieving third-party source code for comparison with the source code subject to code provenance review;
fracturing the source code subject to code provenance review into a first set of logical fragments, wherein a text fracturing algorithm is used to fracture the source code subject to code provenance review into the first set of logical fragments;
fracturing the third-party source code into a second set of logical fragments, wherein the text fracturing algorithm is used to fracture the third-party source code subject into the second set of logical fragments;
generating a first set of fingerprints corresponding to the first set of logical fragments, wherein a fingerprint algorithm is used to generate the first set of fingerprints;
generating a second set of fingerprints corresponding to the logical fragments in the second set of logical fragments, wherein the fingerprint algorithm is used to generate the second set of fingerprints; and
comparing the first set of fingerprints to the second set of fingerprints to determine whether the source code subject to code provenance review contains one or more potential code provenance issues.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method is provided for performing code provenance review in a software due diligence system. In particular, performing code provenance review may include sub-dividing source code under review and third-party source into logical fragments using a language-independent text fracturing algorithm. For example, the fracturing algorithm may include a set of heuristic rules that account for variations in coding style to create logical fragments that are as large as possible without being independently copyrightable. Unique fingerprints may then be generated for the logical fragments using a fingerprint algorithm that features arithmetic computation. As such, potentially related source code may be identified if sub-dividing the source code under review and the third-party source code produces one or more logical fragments that have identical fingerprints.
-
Citations
28 Claims
-
1. A method for performing code provenance review in a software due diligence system, comprising:
-
receiving source code subject to code provenance review; retrieving third-party source code for comparison with the source code subject to code provenance review; fracturing the source code subject to code provenance review into a first set of logical fragments, wherein a text fracturing algorithm is used to fracture the source code subject to code provenance review into the first set of logical fragments; fracturing the third-party source code into a second set of logical fragments, wherein the text fracturing algorithm is used to fracture the third-party source code subject into the second set of logical fragments; generating a first set of fingerprints corresponding to the first set of logical fragments, wherein a fingerprint algorithm is used to generate the first set of fingerprints; generating a second set of fingerprints corresponding to the logical fragments in the second set of logical fragments, wherein the fingerprint algorithm is used to generate the second set of fingerprints; and comparing the first set of fingerprints to the second set of fingerprints to determine whether the source code subject to code provenance review contains one or more potential code provenance issues. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for performing code provenance review, comprising:
-
a crawler configured to retrieve third-party source code from one or more third-party repositories; and a code provenance engine configured to; receive source code subject to code provenance review for comparison with the third-party source code; fracture the source code subject to code provenance review into a first set of logical fragments, wherein the code provenance engine uses a text fracturing algorithm to fracture the source code subject to code provenance review into the first set of logical fragments; fracture the third-party source code into a second set of logical fragments, wherein the code provenance engine uses the text fracturing algorithm to fracture the third-party source code subject into the second set of logical fragments; generate a first set of fingerprints corresponding to the first set of logical fragments, wherein the code provenance engine uses a fingerprint algorithm to generate the first set of fingerprints; generate a second set of fingerprints corresponding to the logical fragments in the second set of logical fragments, wherein the code provenance engine uses the fingerprint algorithm to generate the second set of fingerprints; and compare the first set of fingerprints to the second set of fingerprints to determine whether the source code subject to code provenance review contains one or more potential code provenance issues. - View Dependent Claims (25, 26, 27, 28)
-
Specification