Scaling machine learning using approximate counting
First Claim
Patent Images
1. A method performed by one or more computer devices, comprising:
- storing, in a repository, information regarding a plurality of features;
storing, in a plurality of memory locations in a memory, values relating to the plurality of features;
identifying a particular feature of the plurality of features in the repository;
subjecting a string, associated with the particular feature, to multiple, different hash functions to generate multiple, different hash values;
identifying, for each of the multiple, different hash values, a respective memory location, of the plurality of memory locations in the memory;
reading the values stored at the respective memory locations;
performing an operation on the read values from the respective memory locations to obtain updated values;
writing the updated values into the respective memory locations; and
using the values, including the updated values, to make a prediction regarding particular data.
1 Assignment
0 Petitions
Accused Products
Abstract
A system may track statistics for a number of features using an approximate counting technique by: subjecting each feature to multiple, different hash functions to generate multiple, different hash values, where each of the hash values may identify a particular location in a memory, and storing statistics for each feature at the particular locations identified by the hash values. The system may generate rules for a model based on the tracked statistics.
-
Citations
22 Claims
-
1. A method performed by one or more computer devices, comprising:
-
storing, in a repository, information regarding a plurality of features; storing, in a plurality of memory locations in a memory, values relating to the plurality of features; identifying a particular feature of the plurality of features in the repository; subjecting a string, associated with the particular feature, to multiple, different hash functions to generate multiple, different hash values; identifying, for each of the multiple, different hash values, a respective memory location, of the plurality of memory locations in the memory; reading the values stored at the respective memory locations; performing an operation on the read values from the respective memory locations to obtain updated values; writing the updated values into the respective memory locations; and using the values, including the updated values, to make a prediction regarding particular data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method performed by one or more computer devices, comprising:
-
processing, by one or more processors of the one or more computer devices, each particular feature of a plurality of features in a repository, where processing each particular feature includes; performing, by one or more processors of the one or more computer devices, a plurality of different hash functions on a string, associated with the particular feature, to generate a corresponding plurality of different hash values, identifying, by one or more processors of the one or more computer devices, buckets, of a plurality of buckets in a memory, based on the plurality of different hash values, reading, by one or more processors of the one or more computer devices, a statistical value from each of the identified buckets, updating, by one or more processors of the one or more computer devices, each of the statistical values by subjecting each of the statistical values to a particular function to generate updated statistical values, and writing, by one or more processors of the one or more computer devices, each of the updated statistical values into a corresponding one of the identified buckets; identifying a group of features, of the plurality of features, based on the statistical values, including the updated statistical values, in the plurality of buckets; and using, by one or more processors of the one or more computer devices, the identified group of features and the statistical values associated with the identified group of features to make a prediction regarding particular data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system, comprising:
-
one or more first memory devices to store information regarding a plurality of features; one or more second memory devices to store, in a plurality of memory locations, statistical values relating to the plurality of features; and one or more computer devices to; identify a particular feature of the plurality of features in the one or more first memory devices, subject a string, associated with the particular feature, to multiple, different hash functions to generate multiple, different hash values, identify, for each of the multiple, different hash values, a respective memory location, of the plurality of memory locations in the one or more second memory devices, read the statistical values stored in the respective memory locations, perform an operation on the read statistical values to obtain updated statistical values, write the updated statistical values into the respective memory locations, and use the statistical values, including the updated statistical values, to predict whether a particular e-mail includes spam or to predict whether a particular advertisement will be selected by a user. - View Dependent Claims (21, 22)
-
Specification