Apparatus and method for efficient identification of code similarity
First Claim
1. An apparatus comprising processing circuitry configured to execute instructions for:
- receiving a first threshold and a second threshold;
receiving a plurality of binary reference samples;
processing each reference sample of the plurality of reference samples via operations including;
assigning each reference sample a respective unique identifier;
producing a reference sample fingerprint for each reference sample; and
registering each respective unique identifier to reference sample fingerprint pair in a reference library via operations including;
scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score;
if the first matching score meets or exceeds the first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint, and recording only a unique identifier associated with the reference sample fingerprint in the reference library, the unique identifier being marked as a duplicate of the previously stored fingerprint; and
otherwise, if the first matching score for each previously stored fingerprint is less than the first threshold, storing a corresponding reference sample unique identifier to reference sample fingerprint pair in the reference library;
receiving a binary query sample;
processing the binary query sample via operations including;
producing a query sample fingerprint from the binary query sample;
scoring the query sample fingerprint with each previously stored fingerprint in the reference library to produce a second matching score;
for each previously stored fingerprint for which the second matching score meets or exceeds the second threshold, reporting a corresponding reference sample unique identifier associated with the previously stored fingerprint and the second matching score.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying similarity between query samples and stored samples in an efficiently maintained reference library may include receiving a first threshold and a second threshold, receiving a plurality of binary reference samples, and processing each reference sample of the plurality of reference samples. The processing may include operations of assigning each reference sample a respective unique identifier, producing a reference sample fingerprint for each reference sample, and registering each respective unique identifier to reference sample fingerprint pair in a reference library. The registering may include scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score, if the first matching score meets or exceeds the first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint and recording only a unique identifier associated with the reference sample fingerprint in the reference library where the unique identifier is marked as a duplicate of the previously stored fingerprint, and otherwise, if the first matching score is less than the first threshold, storing a corresponding reference sample unique identifier to reference sample fingerprint pair in the reference library. The method may further include receiving a binary query sample and processing the binary query sample via operations including producing a query sample fingerprint from the binary query sample, scoring the query sample fingerprint with each previously stored fingerprint in the reference library to produce a second matching score, and for each previously stored fingerprint for which the second matching score meets or exceeds the second threshold, reporting a corresponding reference sample unique identifier associated with the previously stored fingerprint and the second matching score.
-
Citations
20 Claims
-
1. An apparatus comprising processing circuitry configured to execute instructions for:
-
receiving a first threshold and a second threshold; receiving a plurality of binary reference samples; processing each reference sample of the plurality of reference samples via operations including; assigning each reference sample a respective unique identifier; producing a reference sample fingerprint for each reference sample; and registering each respective unique identifier to reference sample fingerprint pair in a reference library via operations including; scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score; if the first matching score meets or exceeds the first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint, and recording only a unique identifier associated with the reference sample fingerprint in the reference library, the unique identifier being marked as a duplicate of the previously stored fingerprint; and otherwise, if the first matching score for each previously stored fingerprint is less than the first threshold, storing a corresponding reference sample unique identifier to reference sample fingerprint pair in the reference library; receiving a binary query sample; processing the binary query sample via operations including; producing a query sample fingerprint from the binary query sample; scoring the query sample fingerprint with each previously stored fingerprint in the reference library to produce a second matching score; for each previously stored fingerprint for which the second matching score meets or exceeds the second threshold, reporting a corresponding reference sample unique identifier associated with the previously stored fingerprint and the second matching score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for identifying similarity between query samples and stored samples in an efficiently maintained reference library, the method comprising:
-
receiving, by processing circuitry comprising at least one storage device and processor, a plurality of binary reference samples; processing, by the processing circuitry, each reference sample of the plurality of reference samples via operations including; assigning each reference sample a respective unique identifier; producing a reference sample fingerprint for each reference sample; and registering each respective unique identifier to reference sample fingerprint pair in a reference library via operations including; scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score; if the first matching score meets or exceeds a first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint, and recording only a unique identifier associated with the reference sample fingerprint in the reference library, the unique identifier being marked as a duplicate of the previously stored fingerprint; and otherwise, if the first matching score is less than the first threshold for each previously stored fingerprint, storing a corresponding reference sample unique identifier to reference sample fingerprint pair in the reference library; receiving, by the processing circuitry, a binary query sample; processing, by the processing circuitry, the binary query sample via operations including; producing a query sample fingerprint from the binary query sample; scoring the query sample fingerprint with each previously stored fingerprint in the reference library to produce a second matching score; and for each previously stored fingerprint for which the second matching score meets or exceeds a second threshold, reporting a corresponding reference sample unique identifier associated with the previously stored fingerprint and the second matching score. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification