Systems and methods of data traffic generation via density estimation using SVD
First Claim
Patent Images
1. A machine implemented method comprising:
- utilizing a central processing unit of a server to execute a program of instructions stored in a memory to perform method steps for constructing a density based synthetic data set from a real data set, the program of instructions comprising;
(a) instructions for accepting the real data set at the server;
(b) instructions for partitioning the real data set into clusters;
(c) instructions for, after partitioning the real data set into clusters, creating density estimates separately for each cluster;
wherein said instructions for creating density estimates separately for each cluster comprise;
instructions for performing an SVD transform to find eigenvector summaries;
instructions for determining a grid, wherein a grid resolution along each eigenvector is proportional to a length of an eigenvector in a corresponding direction;
instructions for determining a density estimate at each grid point utilizing kernel density estimation; and
instructions for sampling grid points proportional to density to construct a final set of data points;
wherein the final set of data points comprise the density based synthetic data set and further wherein the density based synthetic data set simulates the distribution of the real data set for data mining computations and simulations; and
(d) instructions for transmitting the density based synthetic data set from the server to a client device.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for providing density-based traffic generation. Data are clustered to create partitions, and transforms of clustered data are constructed in a transformed space. Data points are generated via employing grid discretization in the transformed space, and density estimates of the generated data points are employed to generate synthetic pseudo-points.
27 Citations
9 Claims
-
1. A machine implemented method comprising:
utilizing a central processing unit of a server to execute a program of instructions stored in a memory to perform method steps for constructing a density based synthetic data set from a real data set, the program of instructions comprising; (a) instructions for accepting the real data set at the server; (b) instructions for partitioning the real data set into clusters; (c) instructions for, after partitioning the real data set into clusters, creating density estimates separately for each cluster; wherein said instructions for creating density estimates separately for each cluster comprise; instructions for performing an SVD transform to find eigenvector summaries; instructions for determining a grid, wherein a grid resolution along each eigenvector is proportional to a length of an eigenvector in a corresponding direction; instructions for determining a density estimate at each grid point utilizing kernel density estimation; and instructions for sampling grid points proportional to density to construct a final set of data points; wherein the final set of data points comprise the density based synthetic data set and further wherein the density based synthetic data set simulates the distribution of the real data set for data mining computations and simulations; and (d) instructions for transmitting the density based synthetic data set from the server to a client device. - View Dependent Claims (2, 3)
-
4. An apparatus comprising:
-
at least one central processing unit; and at least one storage device tangibly embodying instructions that when executed by the at least one central processing unit enable the apparatus to; receive a real data set from a client device; generate density estimates of the real data set; construct a final set of synthetic data points based on the density estimates; and send the final set of synthetic data points to the client device; wherein the final set of synthetic data points comprises a density based synthetic data set that simulates the distribution of the real data set for data mining computations and simulations; wherein the instructions that when executed by the at least one central processing unit further enable the apparatus to; create clusters of data of the real data set utilizing a k-means approach; thereafter perform an SVD transform to find eigenvector summaries; thereafter determine a grid, wherein a grid resolution along each eigenvector is proportional to a length of an eigenvector in a corresponding direction; and determine a density estimate at each grid point using kernel density estimation; wherein; the kernel density estimation constructs density estimates from a discrete set of data points via summing up contributions of different data points; axis directions having a least variance in a data space are removed; and grid points are sampled and noise is added in order to construct the final set of synthetic data points. - View Dependent Claims (5, 6)
-
-
7. A system comprising:
-
a client device; and a server having at least one central processing unit, the server being in operative connection with the client device; wherein the server is configured to; (a) receive a real data set from a client device; (b) generate density estimates of the real data set and construct a final set of synthetic data points based on the density estimates, wherein to generate density estimates of the real data set and construct a final set of synthetic data points further comprises; creating clusters of data of the real data set utilizing a k-means approach; thereafter performing an SVD transform to find eigenvector summaries; thereafter determining a grid, wherein a grid resolution along each eigenvector is proportional to a length of an eigenvector in a corresponding direction; and determining a density estimate at each grid point using kernel density estimation; wherein; the kernel density estimation constructs density estimates from a discrete set of data points via summing up contributions of different data points; axis directions having a least variance in a data space are removed; and grid points are sampled and noise is added in order to construct the final set of synthetic data points; and (c) send the final set of synthetic data points to the client device; wherein the final set of synthetic data points comprises a density based synthetic data set that simulates the distribution of the real data set for data mining computations and simulations. - View Dependent Claims (8, 9)
-
Specification