Approximate order statistics of real numbers in generic data
First Claim
1. A computer-implemented method for calculating approximate order statistics from a collection of floating point numbers from a digest in a network comprising:
- receiving machine data, wherein the machine data includes a floating point number;
extracting, using one or more processors, the floating point number from the machine data;
determining, using the one or more processors, an ordinality of the floating point number, wherein the ordinality of each floating point number is determined by converting the floating point number to a mantissa and an exponent and subtracting a number of significant digits in the mantissa to the right of the decimal point including significant zeros from the exponent;
identifying, using the one or more processors and based on the determined ordinality, a level from amongst a plurality of levels in the digest, the digest being stored in a non-transitory memory and including a plurality of buckets positioned along the plurality of levels, wherein each bucket of the plurality of buckets is;
defined by the ordinality of the level along which it is positioned,further defined by a range limited by one or more extrema, andassociated with a count that reflects a quantity of floating point numbers;
identifying, using the one or more processors, a bucket positioned at the identified level and being defined by a range that is inclusive of the floating point number;
incrementing, using the one or more processors, the count of the identified bucket, wherein the identified bucket, for which the count was incremented, has a plurality of child buckets in the digest, wherein the digest is configured to be used to generate a response to a query based on the incremented count of the bucket;
identifying, using the one or more processors, a set of buckets based on a query value in the query, wherein the set of buckets includes the identified bucket; and
estimating, using the one or more processors, an order statistic for the query value based on a summation of counts associated with the identified set of buckets.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system, and processor-readable storage medium are directed towards calculating approximate order statistics on a collection of real numbers. In one embodiment, the collection of real numbers is processed to create a digest comprising hierarchy of buckets. Each bucket is assigned a real number N having P digits of precision and ordinality O. The hierarchy is defined by grouping buckets into levels, where each level contains all buckets of a given ordinality. Each individual bucket in the hierarchy defines a range of numbers—all numbers that, after being truncated to that bucket'"'"'s P digits of precision, are equal to that bucket'"'"'s N. Each bucket additionally maintains a count of how many numbers have fallen within that bucket'"'"'s range. Approximate order statistics may then be calculated by traversing the hierarchy and performing an operation on some or all of the ranges and counts associated with each bucket.
46 Citations
22 Claims
-
1. A computer-implemented method for calculating approximate order statistics from a collection of floating point numbers from a digest in a network comprising:
-
receiving machine data, wherein the machine data includes a floating point number; extracting, using one or more processors, the floating point number from the machine data; determining, using the one or more processors, an ordinality of the floating point number, wherein the ordinality of each floating point number is determined by converting the floating point number to a mantissa and an exponent and subtracting a number of significant digits in the mantissa to the right of the decimal point including significant zeros from the exponent; identifying, using the one or more processors and based on the determined ordinality, a level from amongst a plurality of levels in the digest, the digest being stored in a non-transitory memory and including a plurality of buckets positioned along the plurality of levels, wherein each bucket of the plurality of buckets is; defined by the ordinality of the level along which it is positioned, further defined by a range limited by one or more extrema, and associated with a count that reflects a quantity of floating point numbers; identifying, using the one or more processors, a bucket positioned at the identified level and being defined by a range that is inclusive of the floating point number; incrementing, using the one or more processors, the count of the identified bucket, wherein the identified bucket, for which the count was incremented, has a plurality of child buckets in the digest, wherein the digest is configured to be used to generate a response to a query based on the incremented count of the bucket; identifying, using the one or more processors, a set of buckets based on a query value in the query, wherein the set of buckets includes the identified bucket; and estimating, using the one or more processors, an order statistic for the query value based on a summation of counts associated with the identified set of buckets. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for calculating approximate order statistics, comprising:
-
a processor; and a memory storing instructions that when executed by the processor cause actions to be performed, including; receiving machine data, wherein the machine data includes a floating point number; extracting the floating point number from the machine data; determining an ordinality of the floating point number, wherein the ordinality of each floating point number is determined by converting the floating point number to a mantissa and an exponent and subtracting a number of significant digits in the mantissa to the right of the decimal point including significant zeros from the exponent; identifying, based on the determined ordinality, a level from amongst a plurality of levels in the digest, the digest being stored in a non-transitory memory and including a plurality of buckets positioned along the plurality of levels, wherein each bucket of the plurality of buckets is; defined by the ordinality of the level along which it is positioned, further defined by a range limited by one or more extrema, and associated with a count that reflects a quantity of floating point numbers; identifying a bucket positioned at the identified level and being defined by a range that is inclusive of the floating point number; incrementing the count of the identified bucket, wherein the identified bucket, for which the count was incremented, has a plurality of child buckets in the digest, wherein the digest is configured to be used to generate a response to a query based on the incremented count of the bucket; identifying a set of buckets based on a query value in the query, wherein the set of buckets includes the identified bucket; and estimating an order statistic for the query value based on a summation of counts associated with the identified set of buckets. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory processor readable storage medium storing instructions that cause a processor to perform actions, comprising:
-
accessing a set of floating point numbers receiving machine data, wherein the machine data includes a floating point number; extracting the floating point number from the machine data; determining an ordinality of the floating point number, wherein the ordinality of each floating point number is determined by converting the floating point number to a mantissa and an exponent and subtracting a number of significant digits in the mantissa to the right of the decimal point including significant zeros from the exponent; identifying, based on the determined ordinality, a level from amongst a plurality of levels in the digest, the digest being stored in a non-transitory memory and including a plurality of buckets positioned along the plurality of levels, wherein each bucket of the plurality of buckets is; defined by the ordinality of the level along which it is positioned, further defined by a range limited by one or more extrema, and associated with a count that reflects a quantity of floating point numbers; identifying a bucket positioned at the identified level and being defined by a range that is inclusive of the floating point number; incrementing the count of the identified bucket, wherein the identified bucket, for which the count was incremented, has a plurality of child buckets in the digest, wherein the digest is configured to be used to generate a response to a query based on the incremented count of the bucket; identifying a set of buckets based on a query value in the query, wherein the set of buckets includes the identified bucket; and estimating an order statistic for the query value based on a summation of counts associated with the identified set of buckets. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
a first computing device storing a digest derived from a collection of floating point numbers, wherein the digest includes one or more buckets grouped into one or more levels, wherein a bucket is created for a floating point number the digest lacks a bucket defined by an ordinality matching an ordinality of the number and further defined by a range inclusive of the floating point number and wherein the ordinality of each floating point number is determined by converting the floating point number to a mantissa and an exponent and subtracting a number of significant digits in the mantissa to the right of the decimal point including significant zeros from the exponent, wherein the created bucket is added to a level associated with the ordinality of the floating point number, and wherein the direst is configured to be used to generate a response to a query based on a count of the created bucket; and a second computing device configured to perform actions comprising; receiving the digest; identifying a set of buckets based on a query value in the query; and estimating an order statistic for the query value based on a summation of counts associated with the identified set of buckets. - View Dependent Claims (22)
-
Specification