Method and system for adaptive rule-based content scanners
DC CAFCFirst Claim
Patent Images
1. A computer processor-based multi-lingual method for scanning incoming program code, comprising:
- receiving, by a computer, an incoming stream of program code;
determining, by the computer, any specific one of a plurality of programming languages in which the incoming stream is written;
instantiating, by the computer, a scanner for the specific programming language, in response to said determining, the scanner comprising parser rules and analyzer rules for the specific programming language, wherein the parser rules define certain patterns in terms of tokens, tokens being lexical constructs for the specific programming language, and wherein the analyzer rules identify certain combinations of tokens and patterns as being indicators of potential exploits, exploits being portions of program code that are malicious;
identifying, by the computer, individual tokens within the incoming stream;
dynamically building, by the computer while said receiving receives the incoming stream, a parse tree whose nodes represent tokens and patterns in accordance with the parser rules;
dynamically detecting, by the computer while said dynamically building builds the parse tree, combinations of nodes in the parse tree which are indicators of potential exploits, based on the analyzer rules; and
indicating, by the computer, the presence of potential exploits within the incoming stream, based on said dynamically detecting.
5 Assignments
Litigations
8 Petitions

Accused Products

Abstract
A method for scanning content, including identifying tokens within an incoming byte stream, the tokens being lexical constructs for a specific language, identifying patterns of tokens, generating a parse tree from the identified patterns of tokens, and identifying the presence of potential exploits within the parse tree, wherein said identifying tokens, identifying patterns of tokens, and identifying the presence of potential exploits are based upon a set of rules for the specific language. A system and a computer readable storage medium are also described and claimed.
93 Citations
35 Claims
-
1. A computer processor-based multi-lingual method for scanning incoming program code, comprising:
-
receiving, by a computer, an incoming stream of program code; determining, by the computer, any specific one of a plurality of programming languages in which the incoming stream is written; instantiating, by the computer, a scanner for the specific programming language, in response to said determining, the scanner comprising parser rules and analyzer rules for the specific programming language, wherein the parser rules define certain patterns in terms of tokens, tokens being lexical constructs for the specific programming language, and wherein the analyzer rules identify certain combinations of tokens and patterns as being indicators of potential exploits, exploits being portions of program code that are malicious; identifying, by the computer, individual tokens within the incoming stream; dynamically building, by the computer while said receiving receives the incoming stream, a parse tree whose nodes represent tokens and patterns in accordance with the parser rules; dynamically detecting, by the computer while said dynamically building builds the parse tree, combinations of nodes in the parse tree which are indicators of potential exploits, based on the analyzer rules; and indicating, by the computer, the presence of potential exploits within the incoming stream, based on said dynamically detecting. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer system for multi-lingual content scanning, comprising:
-
a non-transitory computer-readable storage medium storing computer-executable program code that is executed by a computer to scan incoming program code; a receiver, stored on the medium and executed by the computer, for receiving an incoming stream of program code; a multi-lingual language detector, stored on the medium and executed by the computer, operatively coupled to said receiver for detecting any specific one of a plurality of programming languages in which the incoming stream is written; a scanner instantiator, stored on the medium and executed by the computer, operatively coupled to said receiver and said multi-lingual language detector for instantiating a scanner for the specific programming language, in response to said determining, the scanner comprising; a rules accessor for accessing parser rules and analyzer rules for the specific programming language, wherein the parser rules define certain patterns in terms of tokens, tokens being lexical constructs for the specific programming language, and wherein the analyzer rules identify certain combinations of tokens and patterns as being indicators of potential exploits, exploits being portions of program code that are malicious; a tokenizer, for identifying individual tokens within the incoming; a parser, for dynamically building while said receiver is receiving the incoming stream, a parse tree whose nodes represent tokens and patterns in accordance with the parser rules accessed by said rules accessor; and an analyzer, for dynamically detecting, while said parser is dynamically building the parse tree, combinations of nodes in the parse tree which are indicators of potential exploits, based on the analyzer rules; and a notifier, stored on the medium and executed by the computer, operatively coupled to said scanner instantiator for indicating the presence of potential exploits within the incoming stream, based on results of said analyzer. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A non-transitory computer-readable storage medium storing program code for causing a computer to perform the steps of:
-
receiving an incoming stream of program code; determining any specific one of a plurality of programming languages in which the incoming stream is written; instantiating a scanner for the specific programming language, in response to said determining, the scanner comprising parser rules and analyzer rules for the specific programming language, wherein the parser rules define certain patterns in terms of tokens, tokens being lexical constructs for the specific programming language, and wherein the analyzer rules identify certain combinations of tokens and patterns as being indicators of corresponding exploits, exploits being portions of program code that are malicious; identifying individual tokens within the incoming stream; dynamically building, while said receiving receives the incoming stream, a parse tree whose nodes represent tokens and patterns in accordance with the parser rules; dynamically detecting, while said dynamically building builds the parse tree, combinations of nodes in the parse tree which are indicator or potential exploits, based on the analyzer rules; and indicating the presence of potential exploits within the incoming stream, based on said dynamically detecting.
-
-
23. A computer processor-based multi-lingual method for scanning content incoming program code, comprising:
-
for each of a plurality of programming languages, expressing exploits in terms of patterns of tokens and rules, wherein exploits are portions of program code that are malicious, wherein tokens are lexical constructs of a specific programming language, and wherein rules designate certain patterns of tokens as forming programmatical constructs; receiving, by a computer, an incoming stream of program code; determining, by the computer, any specific one of the plurality of programming languages in which the incoming stream is written; dynamically building, while said receiving receives the incoming stream, a parse tree whose nodes represent tokens and rules vis-à
-vis the specific programming language;dynamically detecting, while said dynamically building builds the parse tree, patterns of nodes in the parse tree which are indicators of potential exploits, based on said expressing vis-à
-vis the specific programming language; andindicating, by the computer, the presence of potential exploits within the incoming stream, based on said dynamically detecting. - View Dependent Claims (24, 25, 26, 27, 28)
-
-
29. A computer system for multi-lingual content scanning, comprising:
-
a non-transitory computer-readable storage medium storing computer-executable program code that is executed by a computer to scan incoming program code; an accessor, stored on the medium and executed by the computer, for accessing descriptions of exploits in terms of patterns of tokens and rules, wherein exploits are portions of program code that are malicious, wherein tokens are lexical constructs of any one of a plurality of programming languages, and wherein rules designate certain patterns of tokens as forming programmatical constructs; a receiver, stored on the medium and executed by the computer, for receiving an incoming stream of program code; a multi-lingual language detector, stored on the medium and executed by the computer, operatively coupled with said receiver for detecting any specific one of the plurality of programming languages in which the incoming stream is written; a parser, stored on the medium and executed by the computer, operatively coupled with said accessor, with said receiver and with said language detector for dynamically building, while said receiver is receiving the incoming stream, a parse tree whose nodes represent tokens and rules vis-à
-vis the specific programming language;an analyzer, stored on the medium and executed by the computer, operatively coupled with said parser, with said accessor and with said language detector, for dynamically detecting, while said parser is dynamically building the parse tree, patterns of nodes in the parse tree which are indicators of potential exploits, based on the descriptions of exploits vis-à
-vis the specific programming language; anda notifier, stored on the medium and executed by the computer, operatively coupled with said analyzer, for indicating the presence of potential exploits within the incoming stream, based on results of said analyzer. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. A non-transitory computer-readable storage medium storing program code for causing a computer to perform the steps of:
-
for each of a plurality of programming languages, expressing exploits in terms of corresponding patterns of tokens and rules, wherein exploits are portions of program code that are malicious, wherein tokens are lexical constructs of a specific programming language, and wherein rules designate certain patterns of tokens as forming programmatical constructs; receiving an incoming stream of program code; determining any specific one of the plurality of programming languages in which the incoming stream is written; dynamically building, while said receiving receives the incoming stream, a parse tree whose nodes correspond to tokens and rules vis-à
-vis the specific programming language;dynamically detecting, while said dynamically building builds the parse tree, patterns of nodes in the parse tree which are indicators of potential exploits, based on said expressing vis-à
-vis the specific programming language; andindicating, by the computer, the presence of potential exploits within the incoming stream, based on said dynamically detecting.
-
Specification