×

Coflow identification method and system, and server using method

  • US 10,567,299 B2
  • Filed: 09/11/2018
  • Issued: 02/18/2020
  • Est. Priority Date: 03/11/2016
  • Status: Active Grant
First Claim
Patent Images

1. A coflow identification method for identifying a coflow in a data transmission process in a network, wherein the method comprises:

  • obtaining, by a server, header information of data streams in data transmission in the network, wherein the header information is header information of packets of the data streams comprising source IP addresses of the data streams, source ports of the data streams, destination IP addresses of the data streams, destination ports of the data streams, sending time points of the data streams, and transmission protocols used by the data streams;

    obtaining a data stream aspect data feature, an application aspect data stream feature, and a terminal aspect data feature according to the header information of the data streams, wherein the data stream aspect data feature comprises at least one of a sending time interval metric, a packet length average metric, a packet length variance metric, a packet arrival time interval average metric, a packet arrival time interval variance metric, or a transmission protocol distance metric, wherein the transmission protocol distance metric indicates whether packet transmission protocols are the same;

    the application aspect data stream feature comprises an application aspect data stream feature distance, wherein the application aspect data stream feature distance is used to indicate a degree of aggregation between destination addresses or destination ports in the data transmission or a degree of overlapping between data transmit end IP address sets; and

    the terminal aspect data feature comprises a terminal aspect data feature distance, wherein the terminal aspect data feature distance is used to indicate whether the data streams belong to a same terminal cluster;

    determining a weighted matrix based on historical data in the network, wherein the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, and the feature distance is a weighted distance of at least two of the application aspect data stream feature distance, the terminal aspect data feature distance, or the metrics in the data stream aspect data feature;

    obtaining a multi-dimensional feature distance vector of the data streams between any two data streams in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector;

    computing the feature distance between the any two data streams in the network according to the multi-dimensional feature distance vector and the weighted matrix, wherein the feature distance between the any two data streams in the network is computed according to the multi-dimensional feature distance vector and the weighted matrix by using the following computation formula;

    d(i, j)=∥

    fi

    fj

    A=√

    {square root over (D(i, j)T A D(i, j))}, wherein both d(i, j) and ∥

    fi

    fj

    A represent a feature distance between any two data streams in the network, D(i, j) is a multi-dimensional feature distance vector, D(i, j)T is a transposed matrix of the multi-dimensional feature distance vector, and A is a weighted matrix;

    anddividing the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, wherein a feature distance between any data stream in each aggregation flow and any other data stream in the same aggregation flow is less than a feature distance between the data stream and any data stream in a different aggregation flow, and each of the several cluster sets is a coflow, wherein an aggregation flow comprises data streams that have same destination addresses and same destination.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×