Method for automated detection of data glitches in large data sets
First Claim
Patent Images
1. A method for detecting data glitches in a set of datapoints comprising:
- receiving a set of datapoints, each of the datapoints having n attributes, where n is an integer greater than zero;
transforming the datapoints into a datasphere representation;
segmenting the datasphere into a plurality of regions;
determining a plurality of time series representing the trajectories of the transformed datapoints through regions of the segmented datasphere; and
transmitting data in connection with transformed datapoints whose trajectories exhibit an unusual pattern.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention allow the efficient detection of glitches in a set of data. In one embodiment, a set of datapoints is received. The datapoints are transformed into a transformed space, and the transformed space is segmented into a plurality of regions. For each datapoint, a time series is generated representing the trajectory of the transformed datapoint through regions of the segmented transformed space. Data corresponding to transformed datapoints whose trajectories exhibit an unusual pattern are transmitted.
-
Citations
24 Claims
-
1. A method for detecting data glitches in a set of datapoints comprising:
-
receiving a set of datapoints, each of the datapoints having n attributes, where n is an integer greater than zero; transforming the datapoints into a datasphere representation; segmenting the datasphere into a plurality of regions; determining a plurality of time series representing the trajectories of the transformed datapoints through regions of the segmented datasphere; and transmitting data in connection with transformed datapoints whose trajectories exhibit an unusual pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable medium having stored thereon instructions that, when executed, cause a processor to:
-
receive a set of datapoints, each of the datapoints having n attributes, where n is an integer greater than zero; transform the datapoints into a datasphere representation; segment the datasphere into a plurality of regions; determine a plurality of time series representing the trajectories of the transformed datapoints through regions of the segmented datasphere; and transmit data in connection with transformed datapoints whose trajectories exhibit an unusual pattern.
-
-
12. A method for detecting data glitches in a set of datapoints comprising:
-
receiving a set of datapoints; transforming the datapoints into transformed space; segmenting the transformed space into a plurality of regions; determining a plurality of time series representing the trajectories of the transformed datapoints through regions of the segmented transformed space; transmitting data in connection with transformed datapoints whose trajectories exhibit an unusual pattern. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. An apparatus for determining a network traffic anomaly, the apparatus comprising:
-
(a) a processor; (b) a port coupled to the processor; and (c) a memory, coupled to said processor, said memory storing instructions adapted to be executed by said processor, the instructions including; receiving a set of datapoints, each of the datapoints having n attributes, where n is an integer greater than zero; transforming the datapoints into a datasphere representation; segmenting the datasphere into a plurality of regions; determining a plurality of time series representing the trajectories of the transformed datapoints through regions of the segmented datasphere; and transmitting data in connection with transformed datapoints whose trajectories exhibit an unusual pattern. - View Dependent Claims (19, 20, 21)
-
-
22. A method for detecting glitches in a set of data, the set of data including the values of each of a plurality of measurement vectors at a plurality of time values, the value of a measurement vector at a given time value including the values of a plurality of attributes associated with the measurement vector at the time value, the method comprising:
-
receiving the set of data; transforming each measurement vector of the plurality of measurement vectors at each value of time into a datasphere representation; segmenting the datasphere into a plurality of regions; determining a time series representing the trajectory of each of the plurality of transformed measurement vectors through regions of the segmented datasphere; and transmitting data in connection with transformed measurement vectors whose trajectories exhibit an unusual pattern.
-
-
23. A method for detecting data glitches comprising:
-
representing a datapoint as a first time-dependent vector; generating a second time-dependent vector based on the differences between the first time-dependent vector and a center vector at each of a plurality of time values; comparing the second time-dependent vector against a predetermined segment map at each of the plurality of time values; determining a sector-region occupied by the second time-dependent vector based on the comparison at each of the plurality of time values; and flagging the datapoint if the determined sector-regions match a predetermined pattern. - View Dependent Claims (24)
-
Specification