Document clustering that applies a locality sensitive hashing function to a feature vector to obtain a limited set of candidate clusters

  • US 7,797,265 B2
  • Filed: 02/25/2008
  • Issued: 09/14/2010
  • Est. Priority Date: 02/26/2007
  • Status: Expired due to Fees
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A method of clustering a plurality of documents from a data stream comprising:

  • generating a feature vector for a document in the plurality of documents;

    applying a locality sensitive hashing function to the feature vector;

    retrieving a set of cluster centroids based on a result of the applied locality sensitive hashing function of the feature vector;

    determining a distance between the feature vector of the document and each of the cluster centroids; and

    assigning the document to a cluster based on the determined distances.

View all claims

    Thank you for your feedback