Evaluating techniques for clustering geographic entities

US 8,782,045 B1
Filed: 04/15/2010
Issued: 07/15/2014
Est. Priority Date: 04/15/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

storing, by one or more data processing apparatuses, data identifying a plurality of geographic entities, wherein each entity of the plurality of geographic entities is associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric;

using a first clustering algorithm to cluster, with the one or more data processing apparatuses, the plurality of entities into a first set of clusters according to the location of each of the plurality of geographic entities;

determining, with the one or more data processing apparatuses, an accuracy probability for each geographic entity of the plurality of geographic entities clustered into the first set of clusters, wherein the accuracy probability for a particular geographic entity of the plurality of geographic entities is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other geographic entities of the plurality of geographic entities placed in a different cluster than the particular geographic entity; and

determining, with the one or more data processing apparatuses, a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute for each of the geographic entities weighted by the accuracy probability for the geographic entity.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating clusters of geographic entities, for example, to be used in a randomized geographic experiment. One method includes using a clustering algorithm to cluster geographic entities into a set of clusters, and identifying whether each geographic entity is an ambiguously classified entity or a definitively classified entity. The method further includes determining a measurement for the set of clusters according to a quantification of an attribute of the definitively classified entities and the ambiguously classified entities. Similar measurements can be calculated for other sets of clusters, and the clusters can be compared according to their measurements.

Citations

20 Claims

1. A computer-implemented method, comprising:
- storing, by one or more data processing apparatuses, data identifying a plurality of geographic entities, wherein each entity of the plurality of geographic entities is associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric;
  
  using a first clustering algorithm to cluster, with the one or more data processing apparatuses, the plurality of entities into a first set of clusters according to the location of each of the plurality of geographic entities;
  
  determining, with the one or more data processing apparatuses, an accuracy probability for each geographic entity of the plurality of geographic entities clustered into the first set of clusters, wherein the accuracy probability for a particular geographic entity of the plurality of geographic entities is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other geographic entities of the plurality of geographic entities placed in a different cluster than the particular geographic entity; and
  
  determining, with the one or more data processing apparatuses, a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute for each of the geographic entities weighted by the accuracy probability for the geographic entity.
- View Dependent Claims (2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein the attribute comprises population and wherein the quantification of the attribute of a geographic entity of the plurality of geographic entities is a population associated with the geographic entity.
  - 3. The method of claim 1, further comprising:
    - using a second clustering algorithm to cluster the plurality of entities into a second set of clusters;
      
      determining a second accuracy probability for each geographic entity of the plurality of geographic entities clustered into the second set of clusters;
      
      determining a second cluster measurement for the second set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute for each geographic entity of the plurality of geographic entities weighted by the second accuracy probability for the geographic entity;
      
      comparing the first cluster measurement to the second cluster measurement; and
      
      selecting either the first set of clusters or the second set of clusters according to the comparison.
  - 4. The method of claim 1, wherein determining the first cluster measurement comprises multiplying the quantification of the attribute of each geographic entity of the plurality of geographic entities by the accuracy probability for the geographic entity, resulting in individual products for each of the plurality of the geographic entities, summing the individual products of the plurality of geographic entities, resulting in a sum, and dividing the sum by a sum of the quantification of the attribute for all of the plurality of geographic entities.
  - 5. The method of claim 1, wherein determining the first cluster measurement comprises:
    - determining an individual measurement for each cluster in the first set of clusters, wherein the individual measurement for a cluster is derived by multiplying the quantification of the attribute of each of the geographic entities in the cluster by the accuracy probability for the entity, resulting in individual products for each of the geographic entities in the cluster, summing the individual products, resulting in a sum, and dividing the sum by a sum of the quantification of the attribute of each of the geographic entities in the cluster; and
      
      deriving the first cluster measurement from a number of clusters in the first set of clusters that have an individual measurement that exceeds a threshold.
  - 6. The method of claim 1, wherein the accuracy probability for a geographic entity of the plurality of geographic entities is 1 if the geographic entity is less than a threshold distance from the closest geographic entity in the different cluster, and is 0 otherwise.
  - 10. The method of claim 3, wherein the first clustering algorithm and the second clustering algorithm use a same clustering technique, the first clustering algorithm generates a first number of clusters, the second clustering algorithm generates a second number of clusters, and the first number of clusters and the second number of clusters are different.
  - 11. The method of claim 1, wherein the accuracy probability of a geographic entity of the plurality of geographic entities indicates if the geographic entity is a definitively classified entity or an ambiguously classified entity.
  - 12. The method of claim 11, wherein determining the first cluster measurement for the first set of clusters comprises dividing a sum of the quantifications of the attribute of each of the definitively classified entities of the plurality of entities by a sum of the quantifications of the attribute of each of the definitively classified entities of the plurality of entities and the quantifications of the attribute of each of the ambiguously classified entities of the plurality of entities.
  - 13. The method of claim 12, wherein the sum of quantifications of the attribute of each of the definitively classified entities is a number of the definitively classified entities in the plurality of entities and wherein the sum of quantifications of the attribute of each of the ambiguously classified entities is a number of the ambiguously classified entities in the plurality of entities.
  - 14. The method of claim 12, wherein determining the first cluster measurement for the first set of clusters comprises dividing a sum of the quantifications of the attribute of each of the ambiguously classified entities of the plurality of entities by a sum of the quantifications of the attribute of each of the definitively classified entities of the plurality of entities and the quantifications of the attribute of each of the ambiguously classified entities of the plurality of entities.
  - 15. The method of claim 5, wherein the first cluster measurement is the number of clusters in the first set of clusters that have an individual measurement that exceeds the threshold divided by a total number of clusters in the first set of clusters.
  - 16. The method of claim 1, wherein the experiment-specific metric is a relevant population for an advertising experiment, wherein the quantification of the attribute for each geographic entity of the plurality of geographic entities is the relevant population for the geographic entity.
  - 17. The method of claim 16, wherein the relevant population for each geographic entity of the plurality of geographic entities is a number of users located with the geographic entity satisfying particular demographic criteria.
  - 18. The method of claim 16, wherein the relevant population for each geographic entity of the plurality of geographic entities is a number of users that satisfy particular behavioral criteria.
  - 19. The method of claim 1, wherein the experience-specific metric is a volume of sales that are relevant to the advertising experiment, wherein the quantification of the attribute for each geographic entity of the plurality of geographic entities is the volume of sales made in one or more physical stores located within physical boundaries associated with the geographic entity.

7. A system comprising:
- one or more processors; and
  
  a computer storage medium coupled to the one or more processors and including instructions, which, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  storing data identifying a plurality of geographic entities, wherein each of the geographic entities is associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric;
  
  using a first clustering algorithm to cluster the plurality of entities into a first set of clusters;
  
  determining an accuracy probability for each geographic entity of the plurality of geographic entities in the plurality of geographic entities, wherein the accuracy probability for a particular geographic entity of the plurality of geographic entities is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other geographic entities of the plurality of geographic entities placed in a different cluster than the particular geographic entity; and
  
  determining a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute of each of the geographic entities weighted by the accuracy probability for the geographic entity.
- View Dependent Claims (8, 9)
- - 8. The system of claim 7, further operable to perform operations comprising:
    - using a second clustering algorithm to cluster the plurality of entities into a second set of clusters; and
      
      determining a second cluster measurement for the second set of clusters;
      
      comparing the first cluster measurement to a second cluster measurement; and
      
      selecting either the first set of clusters or the second set of clusters according to the comparison.
  - 9. The system of claim 7, wherein determining the first cluster measurement comprises multiplying the quantification of the attribute of each geographic entity of the plurality of geographic entities by the accuracy probability for the geographic entity, resulting in individual products, summing the individual products, resulting in a sum, and dividing the sum by a total quantification of the attribute for all of the geographic entities.

20. A non-transitory machine-readable medium comprising instructions stored therein, which when executed by one or more machines, cause the one or more machines to perform operations comprising:
- receiving data identifying a plurality of geographic entities, the geographic entities each being associated with a location and a quantification of an attribute, the attribute comprising at least one of a population or an experiment-specific metric;
  
  evaluating a respective set of clusters for each of a plurality of clustering algorithms, the evaluating comprising, for each clustering algorithm;
  
  using a first clustering algorithm to cluster the plurality of entities into a first set of clusters according to the location of each of the plurality of geographic entities;
  
  determining an accuracy probability for each geographic entity of the plurality of geographic entities clustered into the first set of clusters, wherein the accuracy probability for a particular geographic entity is determined according to a distance from the particular geographic entity to a second geographic entity of the plurality of geographic entities, wherein the second geographic entity is placed in a different cluster of the first set of clusters than the particular geographic entity and has a location that is the closest to the location of the particular geographic entity as compared to the other entities of the plurality of entities placed in a different cluster than the particular geographic entity; and
  
  determining a first cluster measurement for the first set of clusters, wherein the first cluster measurement is derived from a quantification of the attribute for each of the geographic entities weighted by the accuracy probability for the geographic entity; and
  
  selecting one of the respective sets of clusters according to the cluster measurements for the clustering algorithms.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Vaver, Jon
Primary Examiner(s)
Mofiz, Apu
Assistant Examiner(s)
Khoshnoodi, Fariborz

Application Number

US12/761,315
Time in Patent Office

1,552 Days
Field of Search

707/737, 707/736
US Class Current

707/736
CPC Class Codes

G06F 18/23213   with fixed number of cluste...

G06F 18/24137   Distances to cluster centroïds

G06Q 30/0201   Market modelling; Market an...

G06Q 30/0242   Determining effectiveness o...

G06V 20/176   Urban or other man-made str...

Evaluating techniques for clustering geographic entities

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Evaluating techniques for clustering geographic entities

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links