Variable sampling rates for website visitation analysis
First Claim
1. A method for sampling a data set comprising a plurality of data items, comprising:
- establishing a target number of sample items to be extracted from a plurality of data items corresponding to a time period;
establishing a tolerance factor;
establishing a target range using the target number and the tolerance factor;
establishing a first sampling rate for sampling the plurality of data items;
applying, by a computer, the first sampling rate to the plurality of data items corresponding to the time period to obtain a first sample set comprising a first plurality of sample items, wherein the first sample set is a subset of the plurality of data items;
determining, by the computer, a number of sample items contained in the first sample set; and
responsive to the number of sample items contained in the first sample set being substantially different from the target number for the time period falling outside the target range;
establishing a second sampling rate that is different from the first sampling rate; and
applying, by the computer, the second sample sampling rate to the plurality of data items to obtain a second sample set comprising a second plurality of sample items.
5 Assignments
0 Petitions
Accused Products
Abstract
A data set containing website traffic data or other data is sampled according to a variable sample rate. A target number of samples per time period is established, and a baseline sample rate is determined. Data items in the data set are sampled according to the baseline sample rate, to obtain a sample set. For time periods where the size of the resulting sample set exceeds the target number of samples, a new sample rate is established and the data items for the time period are resampled. Appropriate sampling capability can thus be provided for website traffic in normal time periods, while maintaining capability for handling spikes and other variations in website traffic as may take place in response to certain periodic or non-periodic events.
67 Citations
43 Claims
-
1. A method for sampling a data set comprising a plurality of data items, comprising:
-
establishing a target number of sample items to be extracted from a plurality of data items corresponding to a time period; establishing a tolerance factor; establishing a target range using the target number and the tolerance factor; establishing a first sampling rate for sampling the plurality of data items; applying, by a computer, the first sampling rate to the plurality of data items corresponding to the time period to obtain a first sample set comprising a first plurality of sample items, wherein the first sample set is a subset of the plurality of data items; determining, by the computer, a number of sample items contained in the first sample set; and responsive to the number of sample items contained in the first sample set being substantially different from the target number for the time period falling outside the target range; establishing a second sampling rate that is different from the first sampling rate; and applying, by the computer, the second sample sampling rate to the plurality of data items to obtain a second sample set comprising a second plurality of sample items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for sampling a data set comprising a plurality of data items, each data item associated with a time period, comprising:
-
establishing a target number of sample items to be extracted from a plurality of data items corresponding to one or more time periods; establishing a tolerance factor; establishing a target range using the target number and the tolerance factor; establishing a first sampling rate for sampling the plurality of data items; for each of the one or more time periods; applying, by a computer, the first sampling rate to the plurality of data items corresponding to the time period to obtain a first sample set comprising a first plurality of sample items for a given time period of the one or more time periods, wherein the first sample set is a subset of the plurality of data items corresponding to the particular given time period; and determining, by the computer, a number of sample items contained in the first sample set; responsive to the number of sample items contained in the first sample set for the time period being substantially different from the target number for the given time period falling outside the target range; establishing a second sampling rate for the given time period, wherein the second sampling rate is different from the first sampling rate; and applying, by the computer, the second sample sampling rate to the plurality of data items corresponding to the given time period to obtain a second sample set for the given time period using the second sampling rate, wherein the second sample set comprises a second plurality of sample items.
-
-
16. A computer program product for sampling a data set comprising a plurality of data items, comprising:
-
a non-transitory computer-readable storage medium; and computer program code, encoded on the non-transitory computer-readable storage medium, for; establishing a target range of sample items to be extracted from a plurality of data items corresponding to a time period; establishing a tolerance factor; establishing a target range using the target number and the tolerance factor; establishing a first sampling rate for sampling the plurality of data items; applying the first sampling rate to the plurality of data items corresponding to the time period to obtain a first sample set comprising a first plurality of sample items, wherein the first sample set is a subset of the plurality of data items; determining a number of sample items contained in the first sample set; and responsive to the number of sample items contained in the first sample set being substantially different from the target number for the time period falling outside the target range; establishing a second sampling rate that is different from the first sampling rate; and applying the second sample sampling rate to the plurality of data items to obtain a second sample set comprising a second plurality of sample items. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer system for sampling a data set comprising a plurality of data items, comprising:
-
a processor configured to execute instructions embodied in a storage device and comprising a log processing module configured for; establishing a target number of sample items to be extracted from a plurality of data items corresponding to a time period, establishing a tolerance factor; establishing a target range using the target number and the tolerance factor; establishing a first sampling rate for sampling the plurality of data items; applying the first sampling rate to the plurality of data items corresponding to the time period to obtain a first sample set comprising a first plurality of sample items, wherein the first sample set is a subset of the plurality of data items; determining a number of sample items contained in the first sample set; and responsive to the number of sample items contained in the first sample set being substantially different from the target number falling outside the target range; establishing a second sampling rate that is different from the first sampling rate; and applying the second sample sampling rate to the plurality of data items to obtain a second sample set comprising a second plurality of sample items; and wherein the storage device is configured for storing at least one of the first and second sample sets. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
Specification