×

Systems, devices, and/or methods for managing data

  • US 8,166,047 B1
  • Filed: 08/06/2008
  • Issued: 04/24/2012
  • Est. Priority Date: 08/06/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • sampling a first dataset and a second dataset;

    obtaining an all-distance bottom-k sketch of the first dataset and the second dataset;

    deriving a k-min sketch from the all-distance bottom-k sketch by utilizing a processor;

    obtaining estimators of the all-distance bottom-k sketch from corresponding estimators of the k-min sketch, wherein the estimators of the all-distance bottom-k sketch have exponentially distributed ranks;

    computing a rank for a new item received for the first dataset and the second dataset by utilizing the processor, wherein the new item has a distance value;

    updating the all-distance bottom-k sketch if the new item has a minimum rank among all items in the all-distance bottom-k sketch having a distance value smaller than the distance value of the new item;

    storing the new item and all items in the all-distance bottom-k sketch in order of increasing distances in order of increasing ranks; and

    rendering an estimator indicative of a size of a planned dataset to be stored on a memory device, wherein the planned dataset is a union of the first dataset and the second dataset.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×