×

Distributed grouping of large-scale data sets

  • US 10,394,913 B1
  • Filed: 07/14/2016
  • Issued: 08/27/2019
  • Est. Priority Date: 07/14/2016
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • a first computing device, wherein the first computing device comprises a processor programmed by executable instructions to at least;

    determine a first cosine distance between a first data vector, represented by a first temporary probabilistic data structure, and a center of a first cluster of data vectors;

    determine a second cosine distance between the first data vector, represented by the first temporary probabilistic data structure, and a center of a second cluster of data vectors;

    determine that the first cosine distance is smaller than the second cosine distance;

    modify a first probabilistic data structure using the first data vector, wherein the first probabilistic data structure comprises data, regarding the first cluster of data vectors, from which the center of the first cluster of data vectors is determined; and

    transmit the first probabilistic data structure to a second computing device; and

    the second computing device, wherein the second computing device comprises a processor programmed by executable instructions to at least;

    determine a third cosine distance between a second data vector, represented by a second temporary probabilistic data structure, and the center of the first cluster of data vectors;

    determine a fourth cosine distance between the second data vector, represented by the second temporary probabilistic data structure, and the center of the second cluster of data vectors;

    determine that the third cosine distance is smaller than the fourth cosine distance;

    modify a second probabilistic data structure using the second data vector, wherein the second probabilistic data structure comprises data, regarding the first cluster of data vectors, from which the center of the first cluster of data vectors is determined;

    receive the first probabilistic data structure from the first computing device; and

    generate a third probabilistic data structure using the first probabilistic data structure and the second probabilistic data structure, wherein the third probabilistic data structure comprises data, regarding the first cluster of data vectors, from which an updated center of the first cluster of data vectors is determined.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×