Apparatus and Method for Efficient Identification of Code Similarity
First Claim
1. An apparatus comprising processing circuitry configured to execute instructions for:
- receiving a first threshold;
receiving a plurality of binary reference samples;
processing each reference sample of the plurality of reference samples via operations including;
assigning each reference sample a respective unique identifier;
producing a reference sample fingerprint for each reference sample; and
registering each respective unique identifier to reference sample fingerprint pair in a reference library via operations including;
scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score;
if the first matching score meets or exceeds the first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint, and recording only a unique identifier associated with the reference sample fingerprint in the reference library, the unique identifier being marked as a duplicate of the previously stored fingerprint; and
otherwise, if the first matching score for each previously stored fingerprint is less than the first threshold, storing a corresponding reference sample unique identifier to reference sample fingerprint pair in the reference library.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying similarity between query samples and stored samples in an efficiently maintained reference library may include receiving a binary query sample and processing the binary query sample via operations including producing a query sample fingerprint from the binary query sample, scoring the query sample fingerprint with each previously stored fingerprint in the reference library to produce a matching score, and for each previously stored fingerprint for which the matching score meets or exceeds a predetermined threshold, reporting a corresponding reference sample unique identifier associated with the previously stored fingerprint and the matching score. Each previously stored fingerprint in the reference library has been determined, prior to storage, as not being duplicative of another fingerprint in the reference library.
88 Citations
20 Claims
-
1. An apparatus comprising processing circuitry configured to execute instructions for:
-
receiving a first threshold; receiving a plurality of binary reference samples; processing each reference sample of the plurality of reference samples via operations including; assigning each reference sample a respective unique identifier; producing a reference sample fingerprint for each reference sample; and registering each respective unique identifier to reference sample fingerprint pair in a reference library via operations including; scoring the reference sample fingerprint with each previously stored fingerprint in the reference library to produce a first matching score; if the first matching score meets or exceeds the first threshold for a previously stored fingerprint, determining the reference sample fingerprint to be a duplicate of the previously stored fingerprint, and recording only a unique identifier associated with the reference sample fingerprint in the reference library, the unique identifier being marked as a duplicate of the previously stored fingerprint; and otherwise, if the first matching score for each previously stored fingerprint is less than the first threshold, storing a corresponding reference sample unique identifier to reference sample fingerprint pair in the reference library. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus comprising processing circuitry configured to execute instructions for:
-
receiving a binary query sample; processing the binary query sample via operations including; producing a query sample fingerprint from the binary query sample; scoring the query sample fingerprint with each previously stored fingerprint in a reference library to produce a matching score, wherein each previously stored fingerprint in the reference library has been determined, prior to storage, as not being duplicative of another fingerprint in the reference library; and for each previously stored fingerprint for which the matching score meets or exceeds a predetermined threshold, reporting a corresponding reference sample unique identifier associated with the previously stored fingerprint and the matching score. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification