Apparatus and method for efficient identification of code similarity
First Claim
1. An apparatus comprising:
- processing circuitry comprising at least one processor configured to execute instructions for;
receiving a first threshold;
receiving a plurality of binary reference samples;
processing each reference sample of the plurality of reference samples via operations including;
producing a reference sample fingerprint for each reference sample; and
initializing a reference library without including duplicate reference sample fingerprints and registering each respective unique identifier via operations including;
scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score;
if the first matching score meets or exceeds the first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint, and recording, in a storage device, only a unique identifier associated with the reference sample fingerprint in the reference library and not the reference sample fingerprint, the unique identifier being marked as a duplicate of the previously stored fingerprint; and
otherwise, if the first matching score for each previously stored fingerprint is less than the first threshold, storing, in the storage device, the reference sample fingerprint and a corresponding reference sample unique identifier in the reference library.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying similarity between query samples and stored samples in an efficiently maintained reference library may include receiving a binary query sample and processing the binary query sample via operations including producing a query sample fingerprint from the binary query sample, scoring the query sample fingerprint with each previously stored fingerprint in the reference library to produce a matching score, and for each previously stored fingerprint for which the matching score meets or exceeds a predetermined threshold, reporting a corresponding reference sample unique identifier associated with the previously stored fingerprint and the matching score. Each previously stored fingerprint in the reference library has been determined, prior to storage, as not being duplicative of another fingerprint in the reference library.
-
Citations
20 Claims
-
1. An apparatus comprising:
-
processing circuitry comprising at least one processor configured to execute instructions for; receiving a first threshold; receiving a plurality of binary reference samples; processing each reference sample of the plurality of reference samples via operations including; producing a reference sample fingerprint for each reference sample; and initializing a reference library without including duplicate reference sample fingerprints and registering each respective unique identifier via operations including; scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score; if the first matching score meets or exceeds the first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint, and recording, in a storage device, only a unique identifier associated with the reference sample fingerprint in the reference library and not the reference sample fingerprint, the unique identifier being marked as a duplicate of the previously stored fingerprint; and otherwise, if the first matching score for each previously stored fingerprint is less than the first threshold, storing, in the storage device, the reference sample fingerprint and a corresponding reference sample unique identifier in the reference library. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus comprising:
-
processing circuitry comprising at least one processor configured to execute instructions for; receiving a binary query sample; processing the binary query sample via operations including; producing a query sample fingerprint from the binary query sample; scoring the query sample fingerprint with each previously stored fingerprint in a reference library to produce a matching score, wherein the reference library has been initialized such that each previously stored fingerprint in the reference library has been determined, prior to storage in a storage device, as not being duplicative of another fingerprint in the reference library and duplicative fingerprints have not been stored in the reference library; and determining whether the query sample fingerprint is a duplicate of any previously stored fingerprint in response to the matching score with a previously stored fingerprint meeting or exceeding a predetermined threshold, and reporting a corresponding reference sample unique identifier associated with the matched previously stored fingerprint and the matching score in response to determining that the query sample fingerprint is a duplicate. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification