×

Computer-implemented systems and methods for variable clustering in large data sets

  • US 8,190,612 B2
  • Filed: 12/17/2008
  • Issued: 05/29/2012
  • Est. Priority Date: 12/17/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for reducing dimensionality of a data set, comprising:

  • accessing, using one or more data processors, a data set including a plurality of observations, wherein each observation has an associated number of attributes, wherein each attribute has a corresponding value, and wherein a number of attributes represents a dimensionality;

    generating, using the one or more data processors, a similarity matrix using the data set, wherein the similarity matrix identifies degrees of similarity among the attributes;

    generating, using the one or more data processors, global clusters of attributes using the similarity matrix, wherein a global cluster includes a subset of the attributes, and wherein the attributes are grouped in the global clusters according to the degrees of similarity;

    generating, using the one or more data processors, a global cluster structure using the global clusters of attributes, wherein generating the global cluster structure includes determining a component for each global cluster of attributes, and performing a latent variable technique using the components;

    generating, using the one or more data processors, a sub-cluster structure using the global clusters of attributes, wherein generating a sub-cluster structure includes performing the latent variable technique or a different latent variable technique on each global cluster of attributes; and

    combining, using the one or more data processors, the global cluster structure and the sub-cluster structure to generate a cluster structure that has a fewer number of attributes than the accessed data set, wherein the fewer number of attributes represents a reduced dimensionality.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×