Systems, devices, and/or methods for managing data
First Claim
Patent Images
1. A method, comprising:
- sampling a first dataset and a second dataset;
obtaining an all-distance bottom-k sketch of the first dataset and the second dataset;
deriving a k-min sketch from the all-distance bottom-k sketch by utilizing a processor;
obtaining estimators of the all-distance bottom-k sketch from corresponding estimators of the k-min sketch, wherein the estimators of the all-distance bottom-k sketch have exponentially distributed ranks;
computing a rank for a new item received for the first dataset and the second dataset by utilizing the processor, wherein the new item has a distance value;
updating the all-distance bottom-k sketch if the new item has a minimum rank among all items in the all-distance bottom-k sketch having a distance value smaller than the distance value of the new item;
storing the new item and all items in the all-distance bottom-k sketch in order of increasing distances in order of increasing ranks; and
rendering an estimator indicative of a size of a planned dataset to be stored on a memory device, wherein the planned dataset is a union of the first dataset and the second dataset.
1 Assignment
0 Petitions
Accused Products
Abstract
Certain exemplary embodiments can provide a method, which can comprise automatically storing and computing a sketch of a dataset that supports an automatically determined estimator of properties of a dataset. The dataset can be related to any population. For example, the dataset can comprise data flows through a network node (e.g., a router), sales data, and/or marketing data, etc. The estimator can be based upon a sketch of the dataset.
51 Citations
20 Claims
-
1. A method, comprising:
-
sampling a first dataset and a second dataset; obtaining an all-distance bottom-k sketch of the first dataset and the second dataset; deriving a k-min sketch from the all-distance bottom-k sketch by utilizing a processor; obtaining estimators of the all-distance bottom-k sketch from corresponding estimators of the k-min sketch, wherein the estimators of the all-distance bottom-k sketch have exponentially distributed ranks; computing a rank for a new item received for the first dataset and the second dataset by utilizing the processor, wherein the new item has a distance value; updating the all-distance bottom-k sketch if the new item has a minimum rank among all items in the all-distance bottom-k sketch having a distance value smaller than the distance value of the new item; storing the new item and all items in the all-distance bottom-k sketch in order of increasing distances in order of increasing ranks; and rendering an estimator indicative of a size of a planned dataset to be stored on a memory device, wherein the planned dataset is a union of the first dataset and the second dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising:
a processor configured to; sample a first dataset and a second dataset; obtain an all-distance bottom-k sketch of the first dataset and the second dataset; derive a k-min sketch from the all-distance bottom-k sketch by utilizing a processor; obtain estimators of the all-distance bottom-k sketch from corresponding estimators of the k-min sketch, wherein the estimators of the all-distance bottom-k sketch have exponentially distributed ranks; compute a rank for a new item received for the first dataset and the second dataset by utilizing the processor, wherein the new item has a distance value; update the all-distance bottom-k sketch if the new item has a minimum rank among all items in the all-distance bottom-k sketch having a distance value smaller than the distance value of the new item; store the new item and all items in the all-distance bottom-k sketch in order of increasing distances in order of increasing ranks; and render an estimator indicative of a size of a planned dataset to be stored on a memory device, wherein the planned dataset is a union of the first dataset and the second dataset. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
18. A non-transitory computer-readable medium comprising instructions, which, when loaded and executed by an electronic processor, causes the electronic processor to perform activities comprising:
-
sampling a first dataset and a second dataset; obtaining an all-distance bottom-k sketch of the first dataset and the second dataset; deriving a k-min sketch from the all-distance bottom-k sketch by utilizing a processor; obtaining estimators of the all-distance bottom-k sketch from corresponding estimators of the k-min sketch, wherein the estimators of the all-distance bottom-k sketch have exponentially distributed ranks; computing a rank for a new item received for the first dataset and the second dataset by utilizing the processor, wherein the new item has a distance value; updating the all-distance bottom-k sketch if the new item has a minimum rank among all items in the all-distance bottom-k sketch having a distance value smaller than the distance value of the new item; storing the new item and all items in the all-distance bottom-k sketch in order of increasing distances in order of increasing ranks; and rendering an estimator indicative of a size of a planned dataset to be stored on a memory device, wherein the planned dataset is a union of the first dataset and the second dataset. - View Dependent Claims (19, 20)
-
Specification