×

Modeling topics using statistical distributions

  • US 9,317,593 B2
  • Filed: 10/01/2008
  • Issued: 04/19/2016
  • Est. Priority Date: 10/05/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • accessing a corpus stored in one or more tangible media, the corpus comprising a plurality of documents, a document comprising a plurality of words;

    selecting one or more words of each document as one or more keywords of the each document;

    clustering the documents according to the keywords to yield a plurality of clusters, each cluster corresponding to a different topic;

    generating a statistical distribution for each cluster from a subset of the words of the documents of the each cluster to yield a plurality of statistical distributions, wherein generating the statistical distribution for the each cluster comprises;

    determining a co-occurrence value indicating a co-occurrence of the topic of the each cluster with the topics of the other clusters in the plurality of documents; and

    generating a co-occurrence distribution from the co-occurrence values;

    modeling each topic using the statistical distribution generated for the cluster corresponding to the each topic;

    organizing the clusters according to the statistical distributions; and

    assigning the topics of the organized clusters to the documents in the organized clusters.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×