Automatic analysis of a computer virus structure and means of attachment to its hosts
First Claim
1. A method for automatically deriving verification and removal information for a function-preserving transformation of computer data from a set of untransformed data samples and corresponding transformed data samples, comprising the steps of:
- obtaining a set of "sample pairs", each sample pair consisting of a transformed data sample and a corresponding original, untransformed data sample;
locating one or more fragments of each original data sample within a corresponding transformed data sample to obtain a generalized description, applicable to each of the sample pairs, of locations of fragments of each original data sample and locations of new data regions added by the function-preserving transformation that applies to each of the sample pairs;
matching new data regions added by the function-preserving transformation across different samples to obtain a description of portions of the new data regions that are "invariant" across different samples;
locating within other, variable portions of the new data regions any data from an original data sample embedded there;
generating a prescription for verifying with high confidence that any given data sample has resulted from an application of the function-preserving transformation; and
generating a prescription for restoring a data sample that has been transformed by the function-preserving transformation to a form functionally equivalent to that prior to the transformation.
1 Assignment
0 Petitions
Accused Products
Abstract
Information pertaining to the verification of the identity of, and reversal of, a transformation of computer data is derived automatically based on a set of samples. The most important class of transformations is computer viruses. The process extracts this information for a large, fairly general class of viruses. Samples consisting of host programs infected with the virus and sample pairs consisting of an infected host and the corresponding original, uninfected host are obtained. A description of how the virus attaches to the host program, including locations within uninfected host of components of both the original host and the virus is generated. Viral code is matched across samples to obtain a description of "invariant" regions of the virus. Host bytes embedded within the virus are located. A description of the original host locations permits ant-virus software on a user'"'"'s machine to restore the bulk of a program that has been infected. Characterization of the correspondence between invariable portions of the virus and destroyed parts of the host enables anti-virus software to complete the repair.
-
Citations
23 Claims
-
1. A method for automatically deriving verification and removal information for a function-preserving transformation of computer data from a set of untransformed data samples and corresponding transformed data samples, comprising the steps of:
-
obtaining a set of "sample pairs", each sample pair consisting of a transformed data sample and a corresponding original, untransformed data sample; locating one or more fragments of each original data sample within a corresponding transformed data sample to obtain a generalized description, applicable to each of the sample pairs, of locations of fragments of each original data sample and locations of new data regions added by the function-preserving transformation that applies to each of the sample pairs; matching new data regions added by the function-preserving transformation across different samples to obtain a description of portions of the new data regions that are "invariant" across different samples; locating within other, variable portions of the new data regions any data from an original data sample embedded there; generating a prescription for verifying with high confidence that any given data sample has resulted from an application of the function-preserving transformation; and generating a prescription for restoring a data sample that has been transformed by the function-preserving transformation to a form functionally equivalent to that prior to the transformation. - View Dependent Claims (2, 3, 4)
-
-
5. A method for automatically deriving verification and removal information for a computer virus from a set of infected programs and corresponding uninfected programs, comprising the steps of:
-
obtaining a set of "sample pairs", each sample pair consisting of a program infected with the computer virus and a corresponding original, uninfected program; generating a description of how the computer virus attaches to host programs; matching viral data across different infected samples to obtain a description of "invariant" portions of the computer virus; locating within other, variable portions of the computer virus any host bytes embedded there; generating a prescription for verifying with high confidence that any given program is infected with the computer virus; and generating a prescription for restoring a program that has been determined to have been infected with the computer virus to a state functionally equivalent to the program'"'"'s original, uninfected state. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
11. The method recited in claim 10 wherein the sections are a partition of original and infected hosts into contiguous regions and there are three types of sections, described by ##EQU9## and wherein sections of type H and HI cover an entire original host without any overlap, and sections of type I and HI cover an entire infected host without any overlap.
-
12. The method recited in claim 5 wherein the step of generating a description of how the computer virus attaches to the host program includes the step of using markers to identify relative locations in the data and a marker takes one of the values ##EQU10## where in the last case, said description includes the character string serving as the marker.
-
-
13. A computing system for automatically deriving verification and removal information for a function-preserving transformation of computer data from a set of untransformed data samples and corresponding transformed data samples, comprising:
-
data accessing means for obtaining a set of "sample pairs", each sample pair consisting of a transformed data sample and a corresponding original, untransformed data sample; scanning means operable on the set of "sample pairs" obtained by said data accessing means for locating one or more fragments of each original data sample within a corresponding transformed data sample to obtain a generalized description, applicable to each of the sample pairs, of the locations of the original fragments and locations of new data regions added by the function-preserving transformation that applies to each of the sample pairs; comparing means operable on the set of "sample pairs" obtained by said data accessing means for matching new data regions added by the function-preserving transformation across different samples to obtain a description of portions of the new data regions that are "invariant" across different samples; said scanning means locating within other, variable portions of the new data regions any data from an original data sample embedded there; and output means responsive to said scanning means and said comparing means for generating a prescription for restoring a data sample that has been transformed by the function-preserving transformation to a form functionally equivalent to that prior to the transformation. - View Dependent Claims (14, 15)
-
-
16. A computer implemented-method for automatically deriving a general description of an effect of a transformation of original data on any given sample of the data, comprising the steps of:
-
obtaining a plurality of input data samples consisting of (a) one or more samples of transformed data resulting from application of the transformation to the original data, and (b) zero or more sample pairs, each sample pair consisting of i. an original data sample, and ii. a transformed data sample resulting from application of the transformation to a same original data sample; comparing the input data samples with one another to obtain a universal transformation description which describes a relationship between original and transformed data, said universal transformation description being consistent with the input data samples; outputting the universal transformation description as a universal transformation which is applicable to samples of data included or not included among the input data samples; and in cases where the transformation is reversible or partly reversible, deriving a general description of a means of reversing or partly reversing an effect of the transformation. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
Specification