Coflow identification method and system, and server using method
First Claim
1. A coflow identification method for identifying a coflow in a data transmission process in a network, wherein the method comprises:
- obtaining, by a server, header information of data streams in data transmission in the network, wherein the header information is header information of packets of the data streams comprising source IP addresses of the data streams, source ports of the data streams, destination IP addresses of the data streams, destination ports of the data streams, sending time points of the data streams, and transmission protocols used by the data streams;
obtaining a data stream aspect data feature, an application aspect data stream feature, and a terminal aspect data feature according to the header information of the data streams, wherein the data stream aspect data feature comprises at least one of a sending time interval metric, a packet length average metric, a packet length variance metric, a packet arrival time interval average metric, a packet arrival time interval variance metric, or a transmission protocol distance metric, wherein the transmission protocol distance metric indicates whether packet transmission protocols are the same;
the application aspect data stream feature comprises an application aspect data stream feature distance, wherein the application aspect data stream feature distance is used to indicate a degree of aggregation between destination addresses or destination ports in the data transmission or a degree of overlapping between data transmit end IP address sets; and
the terminal aspect data feature comprises a terminal aspect data feature distance, wherein the terminal aspect data feature distance is used to indicate whether the data streams belong to a same terminal cluster;
determining a weighted matrix based on historical data in the network, wherein the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, and the feature distance is a weighted distance of at least two of the application aspect data stream feature distance, the terminal aspect data feature distance, or the metrics in the data stream aspect data feature;
obtaining a multi-dimensional feature distance vector of the data streams between any two data streams in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector;
computing the feature distance between the any two data streams in the network according to the multi-dimensional feature distance vector and the weighted matrix, wherein the feature distance between the any two data streams in the network is computed according to the multi-dimensional feature distance vector and the weighted matrix by using the following computation formula;
d(i, j)=∥
fi−
fj∥
A=√
{square root over (D(i, j)T A D(i, j))}, wherein both d(i, j) and ∥
fi−
fj∥
A represent a feature distance between any two data streams in the network, D(i, j) is a multi-dimensional feature distance vector, D(i, j)T is a transposed matrix of the multi-dimensional feature distance vector, and A is a weighted matrix;
anddividing the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, wherein a feature distance between any data stream in each aggregation flow and any other data stream in the same aggregation flow is less than a feature distance between the data stream and any data stream in a different aggregation flow, and each of the several cluster sets is a coflow, wherein an aggregation flow comprises data streams that have same destination addresses and same destination.
1 Assignment
0 Petitions
Accused Products
Abstract
A coflow identification method includes: obtaining a weighted matrix by means of learning according to historical data in the network, where the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows; computing a feature distance between any two data streams in the network according to metrics in the data stream layer data feature, the application layer data stream feature distance, the terminal aspect data feature distance, and the weighted matrix; and dividing the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, where each of the several cluster sets is a coflow.
-
Citations
9 Claims
-
1. A coflow identification method for identifying a coflow in a data transmission process in a network, wherein the method comprises:
-
obtaining, by a server, header information of data streams in data transmission in the network, wherein the header information is header information of packets of the data streams comprising source IP addresses of the data streams, source ports of the data streams, destination IP addresses of the data streams, destination ports of the data streams, sending time points of the data streams, and transmission protocols used by the data streams; obtaining a data stream aspect data feature, an application aspect data stream feature, and a terminal aspect data feature according to the header information of the data streams, wherein the data stream aspect data feature comprises at least one of a sending time interval metric, a packet length average metric, a packet length variance metric, a packet arrival time interval average metric, a packet arrival time interval variance metric, or a transmission protocol distance metric, wherein the transmission protocol distance metric indicates whether packet transmission protocols are the same;
the application aspect data stream feature comprises an application aspect data stream feature distance, wherein the application aspect data stream feature distance is used to indicate a degree of aggregation between destination addresses or destination ports in the data transmission or a degree of overlapping between data transmit end IP address sets; and
the terminal aspect data feature comprises a terminal aspect data feature distance, wherein the terminal aspect data feature distance is used to indicate whether the data streams belong to a same terminal cluster;determining a weighted matrix based on historical data in the network, wherein the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, and the feature distance is a weighted distance of at least two of the application aspect data stream feature distance, the terminal aspect data feature distance, or the metrics in the data stream aspect data feature; obtaining a multi-dimensional feature distance vector of the data streams between any two data streams in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector; computing the feature distance between the any two data streams in the network according to the multi-dimensional feature distance vector and the weighted matrix, wherein the feature distance between the any two data streams in the network is computed according to the multi-dimensional feature distance vector and the weighted matrix by using the following computation formula; d(i, j)=∥
fi−
fj∥
A=√
{square root over (D(i, j)T A D(i, j))}, wherein both d(i, j) and ∥
fi−
fj∥
A represent a feature distance between any two data streams in the network, D(i, j) is a multi-dimensional feature distance vector, D(i, j)T is a transposed matrix of the multi-dimensional feature distance vector, and A is a weighted matrix;and dividing the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, wherein a feature distance between any data stream in each aggregation flow and any other data stream in the same aggregation flow is less than a feature distance between the data stream and any data stream in a different aggregation flow, and each of the several cluster sets is a coflow, wherein an aggregation flow comprises data streams that have same destination addresses and same destination. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A server for identifying a coflow in a data transmission process in a network, comprising:
-
a processor; a memory containing computer instructions for execution by the processor wherein that prompts the processor to be configured to include an information obtaining module, a feature extraction module, a weight learning module, a feature distance computation module, and a coflow clustering module, wherein the information obtaining module is configured to obtain header information of data streams in data transmission in a network and historical data in the network, wherein the header information is header information of packets of the data streams comprising source IP addresses of the data streams, source ports of the data streams, destination IP addresses of the data streams, destination ports of the data streams, sending time points of the data streams, and transmission protocols used by the data streams; the feature extraction module extracts a data stream aspect data feature, an application aspect data stream feature, and a terminal aspect data feature from the header information of the data streams, wherein the data stream aspect data feature comprises at least one of a sending time interval metric, a packet length average metric, a packet length variance metric, a packet arrival time interval average metric, a packet arrival time interval variance metric, or a transmission protocol distance metric;
the application aspect data stream feature comprises an application aspect data stream feature distance, wherein the transmission protocol distance metric indicates whether packet transmission protocols are the same, the application aspect data stream feature distance is used to indicate a degree of aggregation between destination addresses or destination ports in the data transmission or a degree of overlapping between data transmit end IP address sets, wherein the terminal aspect data feature comprises a terminal aspect data feature distance, wherein the terminal aspect data feature distance is used to indicate whether the data streams belong to a same terminal cluster, wherein an terminal cluster comprises at least two terminals having a common attribute of terminal traffic mode;the weight learning module is configured to determine a weighted matrix based on the historical data in the network, wherein the weighted matrix is used to minimize a feature distance between data streams belonging to a same coflow and maximize a feature distance between data streams belonging to different coflows, and the feature distance is a weighted distance of the data stream aspect data feature, the application aspect data stream feature, and the terminal aspect data feature; the feature distance computation module is configured to obtain a multi-dimensional feature distance vector of the data streams between any two data streams in the network, wherein the multi-dimensional feature distance vector comprises at least three dimensions, the at least three dimensions comprise the application aspect data stream feature distance, the terminal aspect data feature distance, and at least one of the sending time interval metric, the packet length average metric, the packet length variance metric, the packet arrival time interval average metric, the packet arrival time interval variance metric, or the transmission protocol distance metric, and each metric or each feature distance forms a dimension of the multi-dimensional feature distance vector; and
compute the feature distance between the any two data streams in the network according to the multi-dimensional feature distance vector and the weighted matrix, wherein the feature distance between the any two data streams in the network is computed according to the multi-dimensional feature distance vector and the weighted matrix by using the following computation formula;d(i, j)=∥
fi−
fj∥
A=√
{square root over (D(i, j)T A D(i, j))}, wherein both d(i, j) and ∥
fi−
fj∥
A represent a feature distance between any two data streams in the network, D(i, j) is a multi-dimensional feature distance vector, D(i, j)T is a transposed matrix of the multi-dimensional feature distance vector, and A is a weighted matrix; andthe coflow clustering module is configured to divide the data streams in the network into several cluster sets by using a clustering algorithm and according to the feature distance between the any two data streams in the network, wherein a feature distance between any data stream in each aggregation flow and any other data stream in the same aggregation flow is less than a feature distance between the data stream and any data stream in a different aggregation flow, and each of the several cluster sets is a coflow, wherein an aggregation flow comprises data streams that have same destination addresses and same destination. - View Dependent Claims (9)
-
Specification