Coflow identification method and system, and server using method

US 10,567,299 B2
Filed: 09/11/2018
Issued: 02/18/2020
Est. Priority Date: 03/11/2016
Status: Active Grant

First Claim

Patent Images

1. A coflow identification method for identifying a coflow in a data transmission process in a network, wherein the method comprises:

obtaining, by a server, header information of data streams in data transmission in the network, wherein the header information is header information of packets of the data streams comprising source IP addresses of the data streams, source ports of the data streams, destination IP addresses of the data streams, destination ports of the data streams, sending time points of the data streams, and transmission protocols used by the data streams;

obtaining a data stream aspect data feature, an application aspect data stream feature, and a terminal aspect data feature according to the header information of the data streams, wherein the data stream aspect data feature comprises at least one of a sending time interval metric, a packet length average metric, a packet length variance metric, a packet arrival time interval average metric, a packet arrival time interval variance metric, or a transmission protocol distance metric, wherein the transmission protocol distance metric indicates whether packet transmission protocols are the same;

the application aspect data stream feature comprises an application aspect data stream feature distance, wherein the application aspect data stream feature distance is used to indicate a degree of aggregation between destination addresses or destination ports in the data transmission or a degree of overlapping between data transmit end IP address sets; and

the terminal aspect data feature comprises a terminal aspect data feature distance, wherein the terminal aspect data feature distance is used to indicate whether the data streams belong to a same terminal cluster;

determining a weighted matrix based on historical data in the network, wherein the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, and the feature distance is a weighted distance of at least two of the application aspect data stream feature distance, the terminal aspect data feature distance, or the metrics in the data stream aspect data feature;

obtaining a multi-dimensional feature distance vector of the data streams between any two data streams in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector;

computing the feature distance between the any two data streams in the network according to the multi-dimensional feature distance vector and the weighted matrix, wherein the feature distance between the any two data streams in the network is computed according to the multi-dimensional feature distance vector and the weighted matrix by using the following computation formula;

d(i, j)=∥

f_i−

f_j∥

_A=√

{square root over (D(i, j)^TA D(i, j))}, wherein both d(i, j) and ∥

f_i−

f_j∥

_Arepresent a feature distance between any two data streams in the network, D(i, j) is a multi-dimensional feature distance vector, D(i, j)^Tis a transposed matrix of the multi-dimensional feature distance vector, and A is a weighted matrix;

anddividing the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, wherein a feature distance between any data stream in each aggregation flow and any other data stream in the same aggregation flow is less than a feature distance between the data stream and any data stream in a different aggregation flow, and each of the several cluster sets is a coflow, wherein an aggregation flow comprises data streams that have same destination addresses and same destination.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A coflow identification method includes: obtaining a weighted matrix by means of learning according to historical data in the network, where the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows; computing a feature distance between any two data streams in the network according to metrics in the data stream layer data feature, the application layer data stream feature distance, the terminal aspect data feature distance, and the weighted matrix; and dividing the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, where each of the several cluster sets is a coflow.

Citations

9 Claims

1. A coflow identification method for identifying a coflow in a data transmission process in a network, wherein the method comprises:
- obtaining, by a server, header information of data streams in data transmission in the network, wherein the header information is header information of packets of the data streams comprising source IP addresses of the data streams, source ports of the data streams, destination IP addresses of the data streams, destination ports of the data streams, sending time points of the data streams, and transmission protocols used by the data streams;
  
  obtaining a data stream aspect data feature, an application aspect data stream feature, and a terminal aspect data feature according to the header information of the data streams, wherein the data stream aspect data feature comprises at least one of a sending time interval metric, a packet length average metric, a packet length variance metric, a packet arrival time interval average metric, a packet arrival time interval variance metric, or a transmission protocol distance metric, wherein the transmission protocol distance metric indicates whether packet transmission protocols are the same;
  
  the application aspect data stream feature comprises an application aspect data stream feature distance, wherein the application aspect data stream feature distance is used to indicate a degree of aggregation between destination addresses or destination ports in the data transmission or a degree of overlapping between data transmit end IP address sets; and
  
  the terminal aspect data feature comprises a terminal aspect data feature distance, wherein the terminal aspect data feature distance is used to indicate whether the data streams belong to a same terminal cluster;
  
  determining a weighted matrix based on historical data in the network, wherein the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, and the feature distance is a weighted distance of at least two of the application aspect data stream feature distance, the terminal aspect data feature distance, or the metrics in the data stream aspect data feature;
  
  obtaining a multi-dimensional feature distance vector of the data streams between any two data streams in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector;
  
  computing the feature distance between the any two data streams in the network according to the multi-dimensional feature distance vector and the weighted matrix, wherein the feature distance between the any two data streams in the network is computed according to the multi-dimensional feature distance vector and the weighted matrix by using the following computation formula;
  
  d(i, j)=∥
  
  f_i−
  
  f_j∥
  
  _A=√
  
  {square root over (D(i, j)^TA D(i, j))}, wherein both d(i, j) and ∥
  
  f_i−
  
  f_j∥
  
  _Arepresent a feature distance between any two data streams in the network, D(i, j) is a multi-dimensional feature distance vector, D(i, j)^Tis a transposed matrix of the multi-dimensional feature distance vector, and A is a weighted matrix;
  
  anddividing the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, wherein a feature distance between any data stream in each aggregation flow and any other data stream in the same aggregation flow is less than a feature distance between the data stream and any data stream in a different aggregation flow, and each of the several cluster sets is a coflow, wherein an aggregation flow comprises data streams that have same destination addresses and same destination.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The coflow identification method according to claim 1, whereinthe sending time interval metric is an absolute value of a difference between sending time points of two data streams;
    - the packet length average metric is an absolute value of a difference between packet length averages of two data streams;
      
      the packet length variance metric is an absolute value of a difference between packet length variances of two data streams;
      
      the packet arrival time interval average metric is an absolute value of a difference between packet arrival time interval averages of two data streams;
      
      the packet arrival time interval variance metric is an absolute value of a difference between packet transmission arrival time interval variances of two data streams; and
      
      when the packet transmission protocols are the same, the transmission protocol distance metric is a non-zero constant, or when the packet transmission protocols are different, the transmission protocol distance metric is zero.
  - 3. The coflow identification method according to claim 1, wherein the obtaining a weighted matrix by means of learning according to historical data in the network comprises:
    - obtaining a multi-dimensional feature distance vector according to the historical data in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector; and
      
      determining a weighted matrix of the multi-dimensional feature distance vector based on the historical data in the network, so as to allocate different weights according to different importance that feature distances of different dimensions play in coflow identification, to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows.
  - 4. The coflow identification method according to claim 3, wherein the determining a weighted matrix of the multi-dimensional feature distance vector comprises:
    - dividing historical data streams in the network into two data stream pair sets according to whether the historical data streams belong to a same coflow, wherein the two data stream pair sets respectively correspond to a coflow data set and a non-coflow data set; and
      
      finding a positive semi-definite matrix A that minimizes a computation result of a target function
  - 5. The coflow identification method according to claim 1, wherein the obtaining an application aspect data stream feature comprises:
    - clustering the data streams according to the source IP addresses and finding all aggregation flows in the network;
      
      finding a source IP address set of the aggregation flows;
      
      for data streams belonging to a same aggregation flow, directly assigning a value to the application aspect feature distance; and
      
      for data streams not belonging to a same aggregation flow, computing a Jacard similarity and computing the application aspect feature distance.
  - 6. The coflow identification method according to claim 1, wherein the obtaining a terminal aspect data feature according to the header information of the data streams comprises:
    - periodically obtaining traffic attribute information of the network, wherein the traffic attribute information comprises at least two of a terminal traffic mode, data traffic of a terminal within a period of time, or a quantity of data streams of a terminal within a period of time;
      
      constructing a weighted traffic matrix according to the obtained traffic attribute information of the network, to distinguish different importance and weights of the terminal traffic mode, the data traffic of a terminal within a period of time, and the quantity of data streams of a terminal within a period of time during computation of a terminal cluster, the data traffic of a terminal within a period of time, or the quantity of data streams of a terminal within a period of time;
      
      obtaining information about the terminal cluster in the network according to the weighted traffic matrix by using a spectral clustering algorithm; and
      
      determining, according to whether the data streams belong to a same terminal cluster, a terminal aspect data feature distance between any two active data streams in a terminal cluster aspect in the current network.
  - 7. The coflow identification method according to claim 6, wherein the constructing a weighted traffic matrix comprises:
    - periodically obtaining data stream information of the network within a period of time T from a data stream information collection and screening module, and computing a weighted traffic matrix within the period of time, wherein a computation formula is as follows;
      
      M(i,j)=V(i,j)×
      
      N(i,j), whereinM∈
      
      R^n×
      
      nrepresents traffic modes of n terminals in the network, n is an integer greater than 1, V(i, j) represents traffic of any terminal pair (i, j) within the period of time, N(i, j) represents a quantity of data streams of the any terminal pair (i, j) formed by an i^thterminal and a j^thterminal within the period of time, and i and j are not equal and are integers greater than 1.

8. A server for identifying a coflow in a data transmission process in a network, comprising:
- a processor;
  
  a memory containing computer instructions for execution by the processor wherein that prompts the processor to be configured to include an information obtaining module, a feature extraction module, a weight learning module, a feature distance computation module, and a coflow clustering module, wherein the information obtaining module is configured to obtain header information of data streams in data transmission in a network and historical data in the network, wherein the header information is header information of packets of the data streams comprising source IP addresses of the data streams, source ports of the data streams, destination IP addresses of the data streams, destination ports of the data streams, sending time points of the data streams, and transmission protocols used by the data streams;
  
  the feature extraction module extracts a data stream aspect data feature, an application aspect data stream feature, and a terminal aspect data feature from the header information of the data streams, wherein the data stream aspect data feature comprises at least one of a sending time interval metric, a packet length average metric, a packet length variance metric, a packet arrival time interval average metric, a packet arrival time interval variance metric, or a transmission protocol distance metric;
  
  the application aspect data stream feature comprises an application aspect data stream feature distance, wherein the transmission protocol distance metric indicates whether packet transmission protocols are the same, the application aspect data stream feature distance is used to indicate a degree of aggregation between destination addresses or destination ports in the data transmission or a degree of overlapping between data transmit end IP address sets, wherein the terminal aspect data feature comprises a terminal aspect data feature distance, wherein the terminal aspect data feature distance is used to indicate whether the data streams belong to a same terminal cluster, wherein an terminal cluster comprises at least two terminals having a common attribute of terminal traffic mode;
  
  the weight learning module is configured to determine a weighted matrix based on the historical data in the network, wherein the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, and the feature distance is a weighted distance of the data stream aspect data feature, the application aspect data stream feature, and the terminal aspect data feature;
  
  the feature distance computation module is configured to obtain a multi-dimensional feature distance vector of the data streams between any two data streams in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector; and
  
  compute the feature distance between the any two data streams in the network according to the multi-dimensional feature distance vector and the weighted matrix, wherein the feature distance between the any two data streams in the network is computed according to the multi-dimensional feature distance vector and the weighted matrix by using the following computation formula;
  
  d(i, j)=∥
  
  f_i−
  
  f_j∥
  
  _A=√
  
  {square root over (D(i, j)^TA D(i, j))}, wherein both d(i, j) and ∥
  
  f_i−
  
  f_j∥
  
  _Arepresent a feature distance between any two data streams in the network, D(i, j) is a multi-dimensional feature distance vector, D(i, j)^Tis a transposed matrix of the multi-dimensional feature distance vector, and A is a weighted matrix; and
  
  the coflow clustering module is configured to divide the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, wherein a feature distance between any data stream in each aggregation flow and any other data stream in the same aggregation flow is less than a feature distance between the data stream and any data stream in a different aggregation flow, and each of the several cluster sets is a coflow, wherein an aggregation flow comprises data streams that have same destination addresses and same destination.
- View Dependent Claims (9)
- - 9. The server according to claim 8, wherein the weight learning module is specifically configured to:
    - obtain a multi-dimensional feature distance vector according to the historical data in the network, and obtain a weighted matrix of the multi-dimensional feature distance vector by means of learning according to the historical data in the network, so as to allocate different weights by using a learning mechanism and according to different importance that feature distances of different dimensions play in coflow identification, to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions correspondingly comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Chen, Zhitang, Geng, Yanhui, Zhang, Hong, Chen, Kai
Primary Examiner(s)
Jangbahadur, Lakeram

Application Number

US16/127,649
Publication Number

US 20180375781A1
Time in Patent Office

525 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 16/906   Clustering; Classification

G06N 20/00   Machine learning

G06N 7/01   Probabilistic graphical mod...

H04L 47/2441   relying on flow classificat...

H04L 47/41   by acting on aggregated flo...

H04L 47/803   Application aware

H04L 69/22   Parsing or analysis of headers

H04L 9/40   Network security protocols

Coflow identification method and system, and server using method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Coflow identification method and system, and server using method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links