×

Clustering of text units using dimensionality reduction of multi-dimensional arrays

  • US 9,141,882 B1
  • Filed: 10/19/2012
  • Issued: 09/22/2015
  • Est. Priority Date: 10/19/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising operations executed on a processor, the operations comprising:

  • tokenizing a plurality of text units from a plurality of documents;

    creating a first multi-dimensional array, wherein the dimensions of the first multi-dimensional array are based upon the plurality of text units;

    normalizing the first multi-dimensional array;

    reducing the dimensionality of the first multi-dimensional array;

    creating a second multi-dimensional array for each of the text units, wherein each text unit is initially assigned a random x-coordinate and a random y-coordinate;

    determining a first distribution based on similarity of each document with each other document in the plurality of text units using the first multi-dimensional array;

    determining a second distribution based on similarity of each document with each other document in the plurality of text units using the second multi-dimensional array based upon the x-coordinates and y-coordinates;

    minimizing divergence between the first distribution and the second distribution by iterating a cost function;

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×