Scaling machine learning using approximate counting that uses feature hashing
First Claim
Patent Images
1. A method, performed by one or more computer devices, for approximate counting, comprising:
- identifying, by one or more processors of the one or more computer devices, a feature of a plurality of features in a repository;
performing, by one or more processors of the one or more computer devices, a plurality of different hash functions on a feature name associated with the feature to generate a corresponding plurality of different hash values;
identifying, by one or more processors of the one or more computer devices, buckets, of a plurality of buckets in a memory, based on the plurality of different hash values;
reading, by one or more processors of the one or more computer devices, a statistical value from each of the identified buckets;
updating, by one or more processors of the one or more computer devices, each of the statistical values by subjecting each of the statistical values to a particular function to generate updated statistical values;
writing, by one or more processors of the one or more computer devices, each of the updated statistical values into a corresponding one of the identified buckets; and
generating, by one or more processors of the one or more computer devices, rules for a model based on the statistical values, including the updated statistical values, in the plurality of buckets.
2 Assignments
0 Petitions
Accused Products
Abstract
A system may track statistics for a number of features using an approximate counting technique by: subjecting each feature to multiple, different hash functions to generate multiple, different hash values, where each of the hash values may identify a particular location in a memory, and storing statistics for each feature at the particular locations identified by the hash values. The system may generate rules for a model based on the tracked statistics.
-
Citations
29 Claims
-
1. A method, performed by one or more computer devices, for approximate counting, comprising:
-
identifying, by one or more processors of the one or more computer devices, a feature of a plurality of features in a repository; performing, by one or more processors of the one or more computer devices, a plurality of different hash functions on a feature name associated with the feature to generate a corresponding plurality of different hash values; identifying, by one or more processors of the one or more computer devices, buckets, of a plurality of buckets in a memory, based on the plurality of different hash values; reading, by one or more processors of the one or more computer devices, a statistical value from each of the identified buckets; updating, by one or more processors of the one or more computer devices, each of the statistical values by subjecting each of the statistical values to a particular function to generate updated statistical values; writing, by one or more processors of the one or more computer devices, each of the updated statistical values into a corresponding one of the identified buckets; and generating, by one or more processors of the one or more computer devices, rules for a model based on the statistical values, including the updated statistical values, in the plurality of buckets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A device for performing approximate counting, comprising:
-
a memory to store statistics regarding a plurality of features in buckets; and a processor to; identify a feature of the plurality of features, subject a feature name, associated with the feature, to a plurality of different hash functions to generate a plurality of different hash values, where the plurality of hash functions includes at least three different hash functions, identify a plurality of the buckets in the memory based on the plurality of different hash values, read the statistics from each of the identified buckets, update each of the statistics by subjecting each of the statistics to a particular function to generate updated statistics, write each of the updated statistics into a corresponding one of the identified buckets, and generate rules for a model based on the statistics, including the updated statistics, in the buckets. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method, performed by one or more computer devices, for approximate counting, comprising:
-
identifying, by one or more processors of the one or more computer devices, a feature of a plurality of features in a repository; performing, by one or more processors of the one or more computer devices, a plurality of different hash functions on a feature name associated with the feature to generate a corresponding plurality of different hash values; identifying, by one or more processors of the one or more computer devices, a plurality of buckets in a memory based on the plurality of different hash values; reading, by one or more processors of the one or more computer devices, a statistical value from each of the identified buckets; determining, by one or more processors of the one or more computer devices, a single statistical value from the statistical values by subjecting the statistical values to a particular function; and generating, by one or more processors of the one or more computer devices, rules for a model based on the single statistical values for a group of the features in the repository. - View Dependent Claims (24, 25)
-
-
26. A device for approximate counting, comprising:
-
a memory to store statistics regarding a plurality of features in buckets; and a processor to; identify a feature of the plurality of features, subject a feature name, associated with the feature, to a plurality of different hash functions to generate a corresponding plurality of different hash values, identify a plurality of the buckets in the memory based on the plurality of different hash values, read the statistics from each of the identified buckets, determine a single statistical value from the statistics by subjecting the statistics to a particular function, and generate rules for a model based on the single statistical values for a group of the features. - View Dependent Claims (27, 28)
-
-
29. A system for approximate counting, comprising:
-
one or more memory devices to store a plurality of distinct features; and one or more computer devices comprising; means for tracking statistics for a set of features, of the plurality of features in the one or more memory devices, using an approximate counting technique including; means for subjecting a feature name, associated with a respective feature of the set of features, to multiple, different hash functions to generate multiple, different hash values, each of the hash values identifying a particular location in a memory, means for generating statistics for each feature of the set of features, and means for storing the statistics, for each feature of the set of features, at the particular locations identified by the hash values; and means for generating rules for a model based on the tracked statistics.
-
Specification