Discovering code and data in a binary executable program
First Claim
1. A method for automatically identifying code portions and data portions in a binary executable software program, wherein the code portions comprise machine instructions that are of arbitrary length, comprising the steps of:
- (a) determining a set of addresses in the binary executable software program that are for any known code portions and for any known data portions;
(b) disassembling machine instructions at a starting address for each known code portion, to identify a set of all possible control flow paths reachable from said starting address, and from the control flow paths that are thus identified, determining a set of target addresses so as to identify other code portions and other data portions;
(c) beginning with bytes of the binary executable software program located at any address that could be a starting point for either a code portion or a data portion, analyzing the bytes to determine if said bytes comprise a code portion; and
(d) reiteratively processing addresses in the binary executable software program that have not yet been identified as being for code portions and for data portions, by repeating steps (b) and (c), to identify other code portions and data portions in the binary executable software program until no further code portions and data portions are identifiable.
8 Assignments
0 Petitions
Accused Products
Abstract
A computer software tool used for automatically identifying code portions and data portions of a binary executable software program in which the code portions include machine instructions that are of arbitrary length. Software products are typically distributed as binary, executable files, which comprise a string of binary values. In general, an executable file has no structure or meaning, except as determined by its behavior when dynamically executed, one instruction at a time, by a digital computer. The software tool determines a set of addresses for any known code and data portions. The tool is then used to disassemble machine instructions, beginning at a starting address for each known code portion, to identify the target addresses of other code portions and other data portions. Other sections of the binary executable software program that could be either code or data are then analyzed to identify additionAL code and data portions. As new portions are identified, the steps are repeated, until no further code or data portions are identifiable. The binary executable software program may include a plurality of executable modules. The entry addresses for each executable module and any addresses for code portions and data portions referenced and identified by any debug address, any export address, and any relocation address is added to the set of addresses. The binary executable software program is then executed to dynamically identify other executable modules so that the set of addresses can be further extended.
50 Citations
39 Claims
-
1. A method for automatically identifying code portions and data portions in a binary executable software program, wherein the code portions comprise machine instructions that are of arbitrary length, comprising the steps of:
-
(a) determining a set of addresses in the binary executable software program that are for any known code portions and for any known data portions; (b) disassembling machine instructions at a starting address for each known code portion, to identify a set of all possible control flow paths reachable from said starting address, and from the control flow paths that are thus identified, determining a set of target addresses so as to identify other code portions and other data portions; (c) beginning with bytes of the binary executable software program located at any address that could be a starting point for either a code portion or a data portion, analyzing the bytes to determine if said bytes comprise a code portion; and (d) reiteratively processing addresses in the binary executable software program that have not yet been identified as being for code portions and for data portions, by repeating steps (b) and (c), to identify other code portions and data portions in the binary executable software program until no further code portions and data portions are identifiable. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for automatically identifying code portions and data portions in a binary executable software program, wherein the code portions comprise machine instructions that are of arbitrary length, comprising:
-
(a) a memory in which machine instructions and data are storable, said machine instructions including the machine instructions comprising the code portions of the binary executable software program as well as machine instructions comprising a software tool; and (b) a processor, coupled to the memory, said processor executing the machine instructions comprising the software tool, which cause the processor to; (i) load the binary software executable program into the memory and determine a set of addresses in the binary executable software program that are for any known code portions and for any known data portions; (ii) disassemble the machine instructions comprising the binary executable software program at a starting address for each known code portion, to identify a set of all possible control flow paths reachable from said starting address, and from the control flow paths that are thus identified, determine a set of target addresses so as to identify other code portions and other data portions; (iii) beginning with bytes of the executable software program located at any address in the binary executable software program that could be a starting point for either a code portion or a data portion, analyze the bytes to determine if said bytes comprise a code portion; and (iv) reiteratively process addresses in the binary executable software program that have not yet been identified as being for code portions and for data portions, by repeating (ii) and (iii) above, to identify other code portions and data portions in the binary executable software program until no further code portions and data portions therein are identifiable. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer readable medium having computer-executable instructions, which when executed on a computer, cause the computer to automatically identify code portions and data portions in a binary executable software program, wherein the code portions comprise machine instructions that are of arbitrary length, said computer-executable instructions causing the computer to perform the steps of:
-
(a) determining a set of addresses in the binary executable software program that are for any known code portions and for any known data portions; (b) disassembling machine instructions at a starting address for each known code portion, to identify a set of all possible control flow paths reachable from said starting address, and from the control flow paths that are thus identified, determining a set of target addresses so as to identify other code portions and other data portions; (c) beginning with bytes of the executable software program located at any address in the binary executable software program that could be a starting point for either a code portion or a data portion, analyzing the bytes to determine if said bytes comprise a code portion; and (d) reiteratively processing addresses in the binary executable software program that have not yet been identified as being for code portions and for data portions, by repeating steps (b) and (c), to identify other code portions and data portions in the binary executable software program until no further code portions and data portions are identifiable. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
Specification