Evaluating techniques for clustering geographic entities
First Claim
1. A computer-implemented method, comprising:
- storing, by one or more data processing apparatuses, data identifying a plurality of geographic entities, wherein each entity of the plurality of geographic entities is associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric;
using a first clustering algorithm to cluster, with the one or more data processing apparatuses, the plurality of entities into a first set of clusters according to the location of each of the plurality of geographic entities;
determining, with the one or more data processing apparatuses, an accuracy probability for each geographic entity of the plurality of geographic entities clustered into the first set of clusters, wherein the accuracy probability for a particular geographic entity of the plurality of geographic entities is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other geographic entities of the plurality of geographic entities placed in a different cluster than the particular geographic entity; and
determining, with the one or more data processing apparatuses, a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute for each of the geographic entities weighted by the accuracy probability for the geographic entity.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating clusters of geographic entities, for example, to be used in a randomized geographic experiment. One method includes using a clustering algorithm to cluster geographic entities into a set of clusters, and identifying whether each geographic entity is an ambiguously classified entity or a definitively classified entity. The method further includes determining a measurement for the set of clusters according to a quantification of an attribute of the definitively classified entities and the ambiguously classified entities. Similar measurements can be calculated for other sets of clusters, and the clusters can be compared according to their measurements.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
storing, by one or more data processing apparatuses, data identifying a plurality of geographic entities, wherein each entity of the plurality of geographic entities is associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric; using a first clustering algorithm to cluster, with the one or more data processing apparatuses, the plurality of entities into a first set of clusters according to the location of each of the plurality of geographic entities; determining, with the one or more data processing apparatuses, an accuracy probability for each geographic entity of the plurality of geographic entities clustered into the first set of clusters, wherein the accuracy probability for a particular geographic entity of the plurality of geographic entities is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other geographic entities of the plurality of geographic entities placed in a different cluster than the particular geographic entity; and determining, with the one or more data processing apparatuses, a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute for each of the geographic entities weighted by the accuracy probability for the geographic entity. - View Dependent Claims (2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
7. A system comprising:
-
one or more processors; and a computer storage medium coupled to the one or more processors and including instructions, which, when executed by the one or more processors, cause the one or more processors to perform operations comprising; storing data identifying a plurality of geographic entities, wherein each of the geographic entities is associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric; using a first clustering algorithm to cluster the plurality of entities into a first set of clusters; determining an accuracy probability for each geographic entity of the plurality of geographic entities in the plurality of geographic entities, wherein the accuracy probability for a particular geographic entity of the plurality of geographic entities is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other geographic entities of the plurality of geographic entities placed in a different cluster than the particular geographic entity; and determining a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute of each of the geographic entities weighted by the accuracy probability for the geographic entity. - View Dependent Claims (8, 9)
-
-
20. A non-transitory machine-readable medium comprising instructions stored therein, which when executed by one or more machines, cause the one or more machines to perform operations comprising:
-
receiving data identifying a plurality of geographic entities, the geographic entities each being associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric; evaluating a respective set of clusters for each of a plurality of clustering algorithms, the evaluating comprising, for each clustering algorithm; using a first clustering algorithm to cluster the plurality of entities into a first set of clusters according to the location of each of the plurality of geographic entities; determining an accuracy probability for each geographic entity of the plurality of geographic entities clustered into the first set of clusters, wherein the accuracy probability for a particular geographic entity is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other entities of the plurality of entities placed in a different cluster than the particular geographic entity; and determining a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute for each of the geographic entities weighted by the accuracy probability for the geographic entity; and selecting one of the respective sets of clusters according to the cluster measurements for the clustering algorithms.
-
Specification