Detecting malicious HTTP redirections using user browsing activity trees
First Claim
Patent Images
1. A method for detecting malicious HTTP redirections in a network, comprising:
- obtaining, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs);
constructing a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, wherein each edge of the per-user tree is annotated by;
1) a URL type assigned to the URL corresponding to the child node and
2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;
extracting, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises;
selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device;
selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL;
detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and
including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs;
updating the per-user tree to include paths corresponding to the extracted second sequence of URLs;
analyzing, by a processor of a computer system using a second pre-determined algorithm, the second sequence of URLs to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector; and
classifying, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for detecting malicious HTTP redirections. The method includes obtaining, based on a single client IP address, HTTP flows triggered by visiting a website, extracting a sequence of URLs where a downstream URL is extracted from a child HTTP request that is triggered by a parent HTTP request containing an immediate upstream URL, analyzing the URL sequence to generate a statistical feature, and classifying, based on the statistical feature, the HTTP flows as containing at least one malicious HTTP redirection triggered by visiting the website.
-
Citations
20 Claims
-
1. A method for detecting malicious HTTP redirections in a network, comprising:
-
obtaining, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs); constructing a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, wherein each edge of the per-user tree is annotated by;
1) a URL type assigned to the URL corresponding to the child node and
2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;extracting, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises; selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device; selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL; detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs; updating the per-user tree to include paths corresponding to the extracted second sequence of URLs; analyzing, by a processor of a computer system using a second pre-determined algorithm, the second sequence of URLs to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector; and classifying, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious. - View Dependent Claims (2, 3, 4, 5, 6, 19)
-
-
7. A system for detecting malicious HTTP redirections in a network, comprising:
-
a computer processor; a flow parser configured to obtain, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs); a per-user tree constructor executing on the computer processor and configured to construct a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, the per-user tree constructor also configured to update the per-user tree based upon extracted sequences of URLs, wherein each edge of the per-user tree is annotated by;
1) a URL type assigned to the URL corresponding to the child node and
2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;an universal resource locator (URL) sequence extractor executing on the computer processor and configured to extract, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises; selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device; selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL; detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs; a feature extractor executing on the computer processor and configured to analyze, using a second pre-determined algorithm, the second sequence of URLs to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector; a classifier executing on the computer processor and configured to classify, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious; and a repository configured to store the first sequence of HTTP request/response pairs, the statistical feature vector, and the second sequence of URLs. - View Dependent Claims (8, 9, 10, 11, 12, 20)
-
-
13. A non-transitory computer readable medium embodying instructions for detecting malicious HTTP redirections in a network, the instructions when executed by a processor comprising functionality for:
-
obtaining, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs); constructing a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, wherein each edge of the per-user tree is annotated by;
1) a URL type assigned to the URL corresponding to the child node and
2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;extracting, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises; selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device; selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL; detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs; updating the per-user tree to include paths corresponding to the extracted second sequence of URLs; analyzing, using a second pre-determined algorithm, the second sequence to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector; and classifying, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification