Detecting malicious HTTP redirections using user browsing activity trees

US 9,531,736 B1
Filed: 12/24/2012
Issued: 12/27/2016
Est. Priority Date: 12/24/2012
Status: Active Grant

First Claim

Patent Images

1. A method for detecting malicious HTTP redirections in a network, comprising:

obtaining, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs);

constructing a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, wherein each edge of the per-user tree is annotated by;

1) a URL type assigned to the URL corresponding to the child node and

2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;

extracting, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises;

selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device;

selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL;

detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and

including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs;

updating the per-user tree to include paths corresponding to the extracted second sequence of URLs;

analyzing, by a processor of a computer system using a second pre-determined algorithm, the second sequence of URLs to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector; and

classifying, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for detecting malicious HTTP redirections. The method includes obtaining, based on a single client IP address, HTTP flows triggered by visiting a website, extracting a sequence of URLs where a downstream URL is extracted from a child HTTP request that is triggered by a parent HTTP request containing an immediate upstream URL, analyzing the URL sequence to generate a statistical feature, and classifying, based on the statistical feature, the HTTP flows as containing at least one malicious HTTP redirection triggered by visiting the website.

Citations

20 Claims

1. A method for detecting malicious HTTP redirections in a network, comprising:
- obtaining, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs);
  
  constructing a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, wherein each edge of the per-user tree is annotated by;
  
  1) a URL type assigned to the URL corresponding to the child node and
  
  2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;
  
  extracting, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises;
  
  selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device;
  
  selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL;
  
  detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and
  
  including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs;
  
  updating the per-user tree to include paths corresponding to the extracted second sequence of URLs;
  
  analyzing, by a processor of a computer system using a second pre-determined algorithm, the second sequence of URLs to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector; and
  
  classifying, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious.
- View Dependent Claims (2, 3, 4, 5, 6, 19)
- - 2. The method of claim 1, further comprising:
    - obtaining a training HTTP flow set corresponding to training visits to websites;
      
      extracting, from the training HTTP flow set and using at least the first pre-determined algorithm, a plurality of training sequences of URLs;
      
      selecting, from the plurality of training sequences of URLs, a first portion identified as malicious sequences and a second portion identified as benign sequences;
      
      analyzing, using the second pre-determined algorithm, the first portion and the second portion to generate a first plurality of statistical features associated with the malicious sequences and a second plurality of statistical features associated with the benign sequences, respectively; and
      
      determining, using a supervised machine learning algorithm, at least one parameter of a classifier based on the first plurality of statistical features and the second plurality of statistical features,wherein the first sequence is classified by using the classifier to analyze the statistical feature of the second sequence.
  - 3. The method of claim 1,wherein any consecutive HTTP flows, if present in the one or more HTTP flows, are separated by less than a pre-determined silence period, andwherein all of the one or more HTTP flows are separated from any other HTTP flow having the single client address by at least the pre-determined silence period.
  - 4. The method of claim 1, wherein the classifying comprises:
    - determining that the second sequence is malicious based on the statistical feature;
      
      identifying, in response to the determining, the at least one malicious HTTP redirection based on the second sequence.
  - 5. The method of claim 1, further comprising:
    - generating, in response to at least the classifying, an alert of a malicious HTTP redirection attack.
  - 6. The method of claim 5, further comprising:
    - analyzing the first sequence and the second sequence to identify the at least one malicious HTTP redirection, andincluding, in the alert, information regarding the malicious HTTP redirection.
  - 19. The method of claim 1,wherein the statistical feature of URLs comprises at least one selected from a group consisting of a first tally of different domains in the second sequence of URLs, a second tally of HTTP redirections in the second sequence of URLs, a third tally of different domain HTTP redirections in the second sequence of URLs, a fourth tally of consecutive HTTP redirections in the second sequence of URLs, a fifth tally of consecutive different domain HTTP redirections in the second sequence of URLs, a sixth tally of consecutive short inter-URL durations in the second sequence of URLs, a length of the second sequence of URLs, and a statistical parameter of inter-URL duration distribution of the second sequence of URLs.

7. A system for detecting malicious HTTP redirections in a network, comprising:
- a computer processor;
  
  a flow parser configured to obtain, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs);
  
  a per-user tree constructor executing on the computer processor and configured to construct a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, the per-user tree constructor also configured to update the per-user tree based upon extracted sequences of URLs, wherein each edge of the per-user tree is annotated by;
  
  1) a URL type assigned to the URL corresponding to the child node and
  
  2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;
  
  an universal resource locator (URL) sequence extractor executing on the computer processor and configured to extract, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises;
  
  selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device;
  
  selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL;
  
  detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and
  
  including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs;
  
  a feature extractor executing on the computer processor and configured to analyze, using a second pre-determined algorithm, the second sequence of URLs to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector;
  
  a classifier executing on the computer processor and configured to classify, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious; and
  
  a repository configured to store the first sequence of HTTP request/response pairs, the statistical feature vector, and the second sequence of URLs.
- View Dependent Claims (8, 9, 10, 11, 12, 20)
- - 8. The system of claim 7,wherein the flow parser is further configured to obtain a training HTTP flow set corresponding to training visits to websites;
    - wherein the URL sequence extractor is further configured to extract, from the training HTTP flow set and using at least the first pre-determined algorithm, a plurality of training sequences of URLs;
      
      wherein the plurality of training sequences of URLs comprises a first portion identified as malicious sequences and a second portion identified as benign sequences;
      
      wherein the feature extractor is further configured to analyze, using the second pre-determined algorithm, the first portion and the second portion to generate a first plurality of statistical features associated with the malicious sequences and a second plurality of statistical features associated with the benign sequences, respectively;
      
      wherein the system further comprises a supervised machine learning module executing on the computer processor and configured to determine, using a supervised machine learning algorithm, at least one parameter of the classifier based on the first plurality of statistical features and the second plurality of statistical features, and wherein the first sequence is classified by using the classifier in response to the determining.
  - 9. The system of claim 7,wherein any consecutive HTTP flows, if present in the one or more HTTP flows, are separated by less than a pre-determined silence period, andwherein all of the one or more HTTP flows are separated from any other HTTP flow having the single client address by at least the pre-determined silence period.
  - 10. The system of claim 7, wherein the classifying comprises:
    - determining that the second sequence is malicious based on the statistical feature;
      
      identifying, in response to the determining, the at least one malicious HTTP redirection based on the second sequence.
  - 11. The system of claim 7, the classifier further configured to:
    - generate, in response to at least the classifying, an alert of a malicious HTTP redirection attack.
  - 12. The system of claim 11, the classifier further configured to:
    - analyze the first sequence and the second sequence to identify the at least one malicious HTTP redirection, andinclude, in the alert, information regarding the malicious HTTP redirection.
  - 20. The system of claim 7,wherein the statistical feature of URLs comprises at least one selected from a group consisting of a first tally of different domains in the second sequence of URLs, a second tally of HTTP redirections in the second sequence of URLs, a third tally of different domain HTTP redirections in the second sequence of URLs, a fourth tally of consecutive HTTP redirections in the second sequence of URLs, a fifth tally of consecutive different domain HTTP redirections in the second sequence of URLs, a sixth tally of consecutive short inter-URL durations in the second sequence of URLs, a length of the second sequence of URLs, and a statistical parameter of inter-URL duration distribution of the second sequence of URLs.

13. A non-transitory computer readable medium embodying instructions for detecting malicious HTTP redirections in a network, the instructions when executed by a processor comprising functionality for:
- obtaining, from the network and based on a single client IP address, one or more HTTP flows triggered by a client device visiting a website, wherein the one or more HTTP flows comprises a first sequence of HTTP request/response pairs, the first sequence of HTTP request/response pairs including a first sequence of universal resource locators (URLs);
  
  constructing a per-user tree using the one more HTTP flows, the per-user tree including nodes corresponding to URLs, including the first sequence of URLs, wherein the per-user tree includes an edge from a parent node to a child node if a request for a URL corresponding to the child node is triggered from the URL corresponding to the parent node, wherein each edge of the per-user tree is annotated by;
  
  1) a URL type assigned to the URL corresponding to the child node and
  
  2) a time that elapses between HTTP requests in the parent node and child node, wherein the per-user tree includes multiple paths, the multiple paths corresponding to both benign requests and malicious paths;
  
  extracting, from the first sequence and using a first pre-determined algorithm, a second sequence of URLs comprising an upstream URL and a downstream URL adjacent to each other in the second sequence, wherein the downstream URL is extracted from a child HTTP request that is subsequent to a parent HTTP request comprising the upstream URL, wherein extracting the second sequence of URLs comprises;
  
  selecting, from the first sequence, the parent HTTP request and the child HTTP request that are generated by the client device;
  
  selecting, from the first sequence, a parent HTTP response received by the client device, wherein the parent HTTP response is generated by a server device identified by the upstream URL;
  
  detecting that the parent HTTP response comprises the downstream URL, wherein the child HTTP request is generated by the client device based on the parent HTTP response; and
  
  including, in response to the detecting, the upstream URL and the downstream URL in the second sequence of URLs;
  
  updating the per-user tree to include paths corresponding to the extracted second sequence of URLs;
  
  analyzing, using a second pre-determined algorithm, the second sequence to generate a statistical feature of URLs based at least on the upstream URL and the downstream URL, the statistical feature being stored in a statistical feature vector; and
  
  classifying, based on the statistical feature of URLs, the first sequence of HTTP request/response pairs as comprising at least one malicious HTTP redirection triggered by visiting the website, wherein classifying includes updating the per-user tree to reflect that the path on the per-user tree corresponding to the at least one malicious HTTP redirection is malicious.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The non-transitory computer readable medium of claim 13, the instructions when executed by the processor further comprising functionality for:
    - obtaining a training HTTP flow set corresponding to training visits to websites;
      
      extracting, from the training HTTP flow set and using at least the first pre-determined algorithm, a plurality of training sequences of URLs;
      
      selecting, from the plurality of training sequences of URLs, a first portion identified as malicious sequences and a second portion identified as benign sequences;
      
      analyzing, using the second pre-determined algorithm, the first portion and the second portion to generate a first plurality of statistical features associated with the malicious sequences and a second plurality of statistical features associated with the benign sequences, respectively; and
      
      determining, using a supervised machine learning algorithm, at least one parameter of a classifier based on the first plurality of statistical features and the second plurality of statistical features,wherein the first sequence is classified by using the classifier to analyze the statistical feature of the second sequence.
  - 15. The non-transitory computer readable medium of claim 13,wherein any consecutive HTTP flows, if present in the one or more HTTP flows, are separated by less than a pre-determined silence period, andwherein all of the one or more HTTP flows are separated from any other HTTP flow having the single client address by at least the pre-determined silence period.
  - 16. The non-transitory computer readable medium of claim 13, wherein the classifying comprises:
    - determining that the second sequence is malicious based on the statistical feature;
      
      identifying, in response to the determining, the at least one malicious HTTP redirection based on the second sequence.
  - 17. The non-transitory computer readable medium of claim 13, the instructions when executed by the processor further comprising functionality for:
    - generating, in response to at least the classifying, an alert of a malicious HTTP redirection attack.
  - 18. The non-transitory computer readable medium of claim 17, the instructions when executed by the processor further comprising functionality for:
    - analyzing the first sequence and the second sequence to identify the at least one malicious HTTP redirection, andincluding, in the alert, information regarding the malicious HTTP redirection.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Boeing Co.
Original Assignee
Narus, Inc. (Gen Digital Inc.)
Inventors
Torres, Ruben, Mekky, Hesham, Zhang, Zhi-Li, Saha, Sabyasachi, Nucci, Antonio
Primary Examiner(s)
Tran, Tri

Application Number

US13/726,475
Time in Patent Office

1,464 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 21/50   Monitoring users, programs ...

G06F 21/552   involving long-term monitor...

G06F 2221/2119   Authenticating web pages, e...

H04L 63/14   for detecting or protecting...

H04L 63/1408   by monitoring network traff...

H04L 63/1416   Event detection, e.g. attac...

H04L 63/1441   Countermeasures against mal...

H04L 63/168   above the transport layer

H04L 67/01   Protocols

H04L 67/02   based on web technology, e....

H04L 67/10   in which an application is ...

Detecting malicious HTTP redirections using user browsing activity trees

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Detecting malicious HTTP redirections using user browsing activity trees

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links