Method and system for detecting and countering malware in a computer
First Claim
Patent Images
1. A method for identifying the existence of malware in a data stream, said method comprising the steps of:
- acquiring a computer database of token strings, each of which is a string of bits or bytes that is characteristic of a string of bits or bytes of a particular malware that may be in the data stream, so that said computer database includes token strings of plural malware entities;
generating, using a hardware processor, a graph from said database of token strings of plural malware entities, in which any token string of an entity of malware which overlaps at least in part a token string of another malware entity is joined thereto by a logic splice; and
performing run-time processing by passing said data stream through at least a portion of said graph while comparing the token string of the data stream with the graph to thereby identify a token string of bits or bytes characteristic of bits or bytes of a malware entity that is in the data stream and, when found, flagging the presence of malware;
wherein generating said graph from said database of token strings of plural malware entities comprises creating a table of preamble entries, each preamble entry being associated with a plurality of pointers, wherein each of the plurality of pointers corresponds to a unique value of a data token, and setting at least one pointer of the plurality of pointers to a node containing a token value that corresponds to the at least one pointer.
1 Assignment
0 Petitions
Accused Products
Abstract
An arrangement analyzes a data stream to identify particular token sequences known to be of interest or malware. A preprocessing step organizes the malware tokens into a “graph” in which overlapping token sequences are interconnected with logic splices. The preprocessing is performed only once for a given set of malware targets. The resulting graph can be traversed quickly in runtime operation to identify malware token strings in the data stream.
-
Citations
18 Claims
-
1. A method for identifying the existence of malware in a data stream, said method comprising the steps of:
acquiring a computer database of token strings, each of which is a string of bits or bytes that is characteristic of a string of bits or bytes of a particular malware that may be in the data stream, so that said computer database includes token strings of plural malware entities; generating, using a hardware processor, a graph from said database of token strings of plural malware entities, in which any token string of an entity of malware which overlaps at least in part a token string of another malware entity is joined thereto by a logic splice; and performing run-time processing by passing said data stream through at least a portion of said graph while comparing the token string of the data stream with the graph to thereby identify a token string of bits or bytes characteristic of bits or bytes of a malware entity that is in the data stream and, when found, flagging the presence of malware; wherein generating said graph from said database of token strings of plural malware entities comprises creating a table of preamble entries, each preamble entry being associated with a plurality of pointers, wherein each of the plurality of pointers corresponds to a unique value of a data token, and setting at least one pointer of the plurality of pointers to a node containing a token value that corresponds to the at least one pointer. - View Dependent Claims (2, 3, 5, 6, 16, 18)
-
4. A method for identifying the existence of malware in a data stream, said method comprising the steps of:
-
acquiring a computer database of token strings, each of which is characteristic of particular malware, wherein said computer database includes token strings of plural malware entities; generating, using a hardware processor, a graph from said database of token strings of plural malware entities, in which any token string of an entity of malware which overlaps at least in part a token string of another malware entity is joined thereto by a logic splice; and performing run-time processing by passing said data stream through at least a portion of said graph while comparing the token string of the data stream with the graph to thereby identify a token string characteristic of a malware entity and, when found, flagging the presence of malware; wherein said step of generating a graph comprises the steps of; generating a preamble table including an entry for every possible preamble of a malware token stream; for each token string of a given malware, locating the preamble of said token string in said preamble table; adding the body of said token string of said given malware to an element of a graph; and selecting from among a plurality of pointers associated with said preamble a pointer storing a location of an element of said graph containing the first token of said body of said token string of said given malware, wherein the selected pointer has an index equal to a value of the first token of said body.
-
-
7. A method for identifying the existence of malware in a data stream, said method comprising the steps of:
-
acquiring a computer database of token strings, each of which token strings is a string of bits or bytes that is characteristic of a string of bits or bytes of a particular malware that may be in the data stream; preprocessing to generate a graph from said database of token strings of malware, in which graph token strings of a given malware which correspond to token strings of another malware are joined by a logic splice, wherein at least some malware token strings are joined to other token strings by at least one splice; performing, using a hardware processor, run-time processing by passing said data stream through at least a portion of said graph while comparing the token string of the data stream with the graph to identify a token string of bits or bytes characteristic of bits or bytes of a malware entity that is in the data stream; and
responsive to the flagging of said given malware entity, taking action against said given malware entity;wherein generating said graph from said database of token strings of malware comprises creating a table of preamble entries, each preamble entry being associated with a plurality of pointers, wherein each of the plurality of pointers corresponds to a unique value of a data token, and setting at least one pointer of the plurality of pointers to a node containing a token value that corresponds to the at least one pointer.
-
-
8. A computer system comprising:
a hardware processor executing instructions for identifying the existence of a string of bits or bytes that is characteristic of bits or bytes of a malware that is in a data stream, the instructions including; generating a graph from a database of token strings of plural malware entities, in which any token string of an entity of malware which overlaps at least in part a token string of another malware entity is joined thereto by a logic splice; and performing run-time processing by passing said data stream through at least a portion of said graph while comparing the token string of the data stream with the graph to thereby identify a token string of bits or bytes characteristic of bits or bytes of a malware entity that is in the data stream and, when found, flagging the presence of malware; wherein generating said graph from said database of token strings of plural malware entities comprises creating a table of preamble entries, each preamble entry being associated with a plurality of pointers, wherein each of the plurality of pointers corresponds to a unique value of a data token, and setting at least one pointer of the plurality of pointers to a node containing a token value that corresponds to the at least one pointer. - View Dependent Claims (9, 10, 12, 13, 17)
-
11. A computer system comprising:
a hardware processor executing instructions for identifying the existence of malware in a data stream, the instructions including; generating a graph from a database of token strings of plural malware entities, in which any token string of an entity of malware which overlaps at least in part a token string of another malware entity is joined thereto by a logic splice; and performing run-time processing by passing said data stream through at least a portion of said graph while comparing the token string of the data stream with the graph to thereby identify a token string characteristic of a malware entity and, when found, flagging the presence of malware; wherein said instruction for generating a graph includes instructions for; generating a preamble table including an entry for every possible preamble of a malware token stream; for each token string of a given malware, locating the preamble of said token string in said preamble table; adding the body of said token string of said given malware to an element of a graph; selecting from among a plurality of pointers associated with said preamble a pointer storing a location of a node of said graph containing the first token of said body of said token string of said given malware, wherein the selected pointer has an index equal to a value of the first token of said body.
-
14. A method for identifying the existence of a string of bits or bytes that is characteristic of bits or bytes of a particular malware in a data stream, said method comprising the steps of:
-
generating, using a hardware processor, a graph from token strings of bits or bytes that is characteristic of bits or bytes of plural malware, in which graph any token string of an entity of malware which overlaps at least in part a token string of another malware entity is joined thereto by a logic splice; performing run-time processing by passing said data stream through at least a portion of said graph while comparing the token string of the data stream with the graph to thereby identify a token string characteristic of a malware entity, and when found, flagging the presence of malware; wherein generating said graph from said database of token strings comprises creating a table of preamble entries, each preamble entry being associated with a plurality of pointers, wherein each of the plurality of pointers corresponds to a unique value of a data token, and setting at least one pointer of the plurality of pointers to a node containing a token value corresponding to its pointer.
-
-
15. A method for identifying the existence of malware in a data stream, said method comprising the steps of:
-
acquiring a computer database of token strings, each of which token strings is a string of bits or bytes that is characteristic of bits or bytes of a particular malware in the data stream; preprocessing to generate a graph from said database of token strings of malware, in which graph portions of a token string of a given malware which correspond to portions of a token string of another malware are joined by a logic splice, so that at least some malware token strings are joined to other malware token strings by a least one splice; performing, using a hardware processor, run-time processing by passing said data stream through a portion of said graph while comparing the token string of the data stream with the graph to thereby (a) identify a token string of bits or bytes characteristic of bits or bytes of the given malware in the data stream and, when found, flagging the presence of said identified given malware, and (b) when a token string characteristic of said other malware is identified, routing said data stream over the associated splice to a further portion of said graph to continue said comparing; and responsive to the flagging of said given malware, taking action against said given malware; wherein generating said graph from said database of token strings of malware comprises creating a table of preamble entries, each preamble entry being associated with a plurality of pointers, wherein each of the plurality of pointers corresponds to a value of a data token, and setting at least one pointer of the plurality of pointers to a location of a node containing a token value that corresponds to the at least one pointer.
-
Specification