System and Method for Entropy-Based Near-Match Analysis
First Claim
1. In a computer forensic investigation system including an examining machine coupled to one or more target machines over a data communications network, a method for identifying one or more files in the one or more target machines that are a near-match to a reference file, the method comprising:
- computing or identifying an entropy of the reference file and outputting a first entropy value;
identifying a second entropy value of a target file stored in the one or more target machines;
determining a likeness of content in the target file to content in the reference file based on the first and second entropy values;
identifying a tolerance threshold;
determining a near-match between the target file and the reference file if the likeness of the target file to the reference file is within the tolerance threshold; and
displaying on a display, information on the target file in response to the determining of a near-match.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method for an entropy-based near-match analysis identifies target files that are almost, but not identical, to a reference file. A computing processor computes entropies of the reference and target files, and determines the likeness of the target files to the references file based on the computed entropies. The computing processor determines a near match between the target file and the reference file if the likeness of the two files is within a user-defined tolerance level. According to one embodiment of the invention, the information entropy is a weighted value that takes into account the size of the file.
33 Citations
25 Claims
-
1. In a computer forensic investigation system including an examining machine coupled to one or more target machines over a data communications network, a method for identifying one or more files in the one or more target machines that are a near-match to a reference file, the method comprising:
-
computing or identifying an entropy of the reference file and outputting a first entropy value; identifying a second entropy value of a target file stored in the one or more target machines; determining a likeness of content in the target file to content in the reference file based on the first and second entropy values; identifying a tolerance threshold; determining a near-match between the target file and the reference file if the likeness of the target file to the reference file is within the tolerance threshold; and displaying on a display, information on the target file in response to the determining of a near-match. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An examining machine coupled to one or more target machines over a data communications network, the examining machine comprising:
-
a display device; processor coupled to the display device; and a memory operably coupled to the processor and having program instructions stored therein, the processor being operable to execute the program instructions, the program instructions including; computing or identifying an entropy of the reference file and outputting a first entropy value; identifying a second entropy value of a target file stored in the one or more target machines; determining a likeness of content in the target file to content in the reference file based on the first and second entropy values; identifying a tolerance threshold; determining a near-match between the target file and the reference file if the likeness of the target file to the reference file is within the tolerance threshold; and displaying on the display device information on the target file in response to the determining of a near-match. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer forensic investigation system for identifying one or more files in one or more target machines that are a near-match to a reference file, the system comprising:
-
means for computing or identifying an entropy of the reference file and outputting a first entropy value; means for identifying a second entropy value of a target file stored in the one or more target machines; means for determining a likeness of content in the target file to content in the reference file based on the first and second entropy values; means for identifying a tolerance threshold; means for determining a near-match between the target file and the reference file if the likeness of the target file to the reference file is within the tolerance threshold; and means for displaying information on the target file in response to the determining of a near-match.
-
Specification