Methods and apparatus to identify malicious activity in a network

US 9,503,465 B2
Filed: 11/14/2013
Issued: 11/22/2016
Est. Priority Date: 11/14/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

generating, with a processor, a set of statistical features based on communications between a plurality of network devices including a set of suspect devices classified as being associated with malicious activity and a set of unclassified devices;

iteratively adjusting, with the processor and for a first number of iterations, a set of weights of a distance function representing differences between vectors of statistical features for different devices, the weights corresponding to the statistical features, the set of weights to be adjusted at each iteration based on a calculated gradient and step size to (1) reduce a first distance calculated between a first suspect device of the set of suspect devices and a second suspect device of the set of suspect devices and (2) increase a second distance calculated between the first suspect device and a first unclassified device of the set of unclassified devices; and

in response to determining a first statistical feature of the set of statistical features is indicative of malicious activity based on a corresponding first weight, sending information identifying the first statistical feature of the set of statistical features to a network monitor that is to determine whether any of the unclassified devices are associated with malicious activity.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, apparatus, systems and articles of manufacture are disclosed to learn malicious activity. An example method includes assigning weights of a distance function to respective statistical features; iteratively calculating, with a processor, the distance function to adjust the weights (1) to cause a reduction in a first distance calculated according to the distance function for a first pair of entities in a reference group associated with malicious activity and (2) to cause an increase in a second distance calculated according to the distance function for a first one of the entities included in the reference group and a second entity not included in the reference group; and determining whether a first statistical feature is indicative of malicious activity based on a respective adjusted weight of the first statistical feature determined after calculating the distance function for a number of iterations.

28 Citations

18 Claims

1. A method comprising:
- generating, with a processor, a set of statistical features based on communications between a plurality of network devices including a set of suspect devices classified as being associated with malicious activity and a set of unclassified devices;
  
  iteratively adjusting, with the processor and for a first number of iterations, a set of weights of a distance function representing differences between vectors of statistical features for different devices, the weights corresponding to the statistical features, the set of weights to be adjusted at each iteration based on a calculated gradient and step size to (1) reduce a first distance calculated between a first suspect device of the set of suspect devices and a second suspect device of the set of suspect devices and (2) increase a second distance calculated between the first suspect device and a first unclassified device of the set of unclassified devices; and
  
  in response to determining a first statistical feature of the set of statistical features is indicative of malicious activity based on a corresponding first weight, sending information identifying the first statistical feature of the set of statistical features to a network monitor that is to determine whether any of the unclassified devices are associated with malicious activity.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method according to claim 1, wherein generating the set of statistical features based on the communications includes:
    - parsing network log records into fields based on communication information in the network logs;
      
      determining categories of the fields based on the communication information in the respective fields; and
      
      generating the statistical features from the network log records based on the categories of the fields.
  - 3. The method according to claim 2, wherein generating the set of statistical features based on the communications further includes:
    - generating a first tier of statistical features from a first set of the fields identified as storing counter information and a second set of the fields identified as storing identity information;
      
      generating a second tier of statistical features from the first tier of statistical features and a third set of fields identified as storing communication type information; and
      
      generating a third tier of statistical features from the first tier of statistical features and the second tier of statistical features by generating ratios of respective pairs of statistical features from the first and second tiers of statistical features.
  - 4. The method according to claim 1, wherein the first statistical feature is determined to be indicative of malicious activity if, after adjusting the weights, the corresponding first weight has a value greater than a second weight corresponding to a second statistical feature.
  - 5. The method according to claim 1, further including identifying a second unclassified device of the set of unclassified devices as being associated with malicious activity based on the first statistical feature.
  - 6. The method according to claim 5, wherein identifying the second unclassified device as being associated with malicious activity further includes:
    - comparing a first respective value of the first statistical feature generated for the second unclassified device to a second respective value of the first statistical feature generated for a third suspect device of the set of suspect devices; and
      
      determining the second unclassified device is malicious if the first respective value of the first statistical feature generated for the second unclassified device is within a threshold value of the second respective value of the first statistical feature generated for a third suspect device.

7. An apparatus comprising:
- a memory to store machine readable instructions; and
  
  a processor to execute the instructions to perform operations including;
  
  generating a set of statistical features based on communications between a plurality of network devices including a set of suspect devices classified as being associated with malicious activity and a set of unclassified devices;
  
  iteratively adjusting, for a first number of iterations, a set of weights of a distance function representing differences between vectors of statistical features for different devices, the weights corresponding to the statistical features, the set of weights to be adjusted at each iteration based on a calculated gradient and step size to (1) reduce a first distance calculated between a first suspect device of the set of suspect devices and a second suspect device of the set of suspect devices and (2) increase a second distance calculated between the first suspect device and a first unclassified device of the set of unclassified devices; and
  
  in response to determining a first statistical feature of the set of statistical features is indicative of malicious activity based on a corresponding first weight, sending information identifying a first statistical feature of the set of statistical features to a network monitor that is to determine whether any of the unclassified devices are associated with malicious activity.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The apparatus according to claim 7, wherein generating the set of statistical features based on the communications includes:
    - parsing network log records into fields based on communication information in the network logs;
      
      determining categories of the fields based on the communication information in the respective fields; and
      
      generating the statistical features from the network log records based on the categories of the fields.
  - 9. The apparatus according to claim 8, wherein generating the set of statistical features based on the communications further includes:
    - generating a first tier of statistical features from a first set of the fields identified as storing counter information and a second set of the fields identified as storing identity information,generating a second tier of statistical features from the first tier of statistical features and a third set of fields identified as storing communication type information; and
      
      generating a third tier of statistical features from the first tier of statistical features and the second tier of statistical features by generating ratios of respective pairs of statistical features from the first and second tiers of statistical features.
  - 10. The apparatus according to claim 7, wherein the first statistical feature is determined to be indicative of malicious activity if, after adjusting the weights, the respective adjusted corresponding first weight has a value greater than a second respective adjusted weight corresponding to a second statistical feature.
  - 11. The apparatus according to claim 7, wherein the operations further include identifying a second unclassified device of the set of unclassified devices as being associated with malicious activity based on the first statistical feature.
  - 12. The apparatus according to claim 7, wherein the operations identifying the second unclassified device as being associated with malicious activity further includes:
    - comparing a first respective value of the first statistical feature generated for the second unclassified device to a second respective value of the first statistical feature generated for the third suspect device of the set of suspect devices; and
      
      determining the second unclassified device is malicious if the first respective value of the first statistical feature generated for the second unclassified device is within a threshold value of the second respective value of the first statistical feature generated for a third suspect device.

13. A tangible machine readable storage medium including instructions which, when executed, cause a machine to perform operations comprising:
- generating a set of statistical features based on communications between a plurality of network devices including a set of suspect devices classified as being associated with malicious activity and a set of unclassified devices;
  
  iteratively adjusting, for a first number of iterations, a set of weights of a distance function representing differences between vectors of statistical features for different devices, the weights corresponding to the statistical features, the set of weights to be adjusted at each iteration based on a calculated gradient and step size to (1) reduce a first distance calculated between a first suspect device of the set of suspect devices and a second suspect device of the set of suspect devices and (2) an increase a second distance calculated between the first suspect device and a first unclassified device of the set of unclassified devices; and
  
  in response to determining a first statistical feature of the set of statistical features is indicative of malicious activity based on a corresponding first weight, sending information identifying a first statistical feature of a set of statistical features to a network monitor that is to determine whether any of the unclassified devices are associated with malicious activity.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The storage medium according to claim 13, wherein generating the set of statistical features based on the communications includes:
    - parsing network log records into fields based on communication information in the network logs;
      
      determining categories of the fields based on the communication information in the respective fields; and
      
      generating the statistical features from the network log records based on the categories of the fields.
  - 15. The storage medium according to claim 14, wherein generating the set of statistical features based on the communications further includes:
    - generating a first tier of statistical features from a first set of the fields identified as storing counter information and a second set of the fields identified as storing identity information,generating a second tier of statistical features from the first tier of statistical features and a third set of fields identified as storing communication type information; and
      
      generating a third tier of statistical features from the first tier of statistical features and the second tier of statistical features by generating ratios of respective pairs of statistical features from the first and second tiers of statistical features.
  - 16. The storage medium according to claim 13, wherein the first statistical feature is determined to be indicative of malicious activity if, after adjusting the weights, the corresponding first weight has a value greater than a second weight corresponding to a second statistical feature.
  - 17. The storage medium according to claim 13, wherein the operations further include identifying a second unclassified device of the set of unclassified devices as being associated with malicious activity based on the first statistical feature.
  - 18. The storage medium according to claim 17, wherein identifying the second unclassified device as being associated with malicious activity further includes:
    - comparing a first respective value of the first statistical feature generated for the second unclassified device to a second respective value of the first statistical feature generated for a third suspect device of the set of suspect devices; and
      
      determining the second unclassified device is malicious if the first respective value of the first statistical feature generated for the second unclassified device is within a threshold value of the second respective value of the first statistical feature generated for a third suspect device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Coskun, Baris
Primary Examiner(s)
POWERS, WILLIAM S

Application Number

US14/080,532
Publication Number

US 20150135320A1
Time in Patent Office

1,104 Days
Field of Search

726/24
US Class Current

1/1
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

H04L 43/04   Processing captured monitor...

H04L 63/0254   Stateful filtering

H04L 63/1416   Event detection, e.g. attac...

H04L 63/1425   Traffic logging, e.g. anoma...

Methods and apparatus to identify malicious activity in a network

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

28 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus to identify malicious activity in a network

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

28 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links