×

System, method and apparatus for increasing speed of hierarchial latent dirichlet allocation model

  • US 8,527,448 B2
  • Filed: 12/20/2012
  • Issued: 09/03/2013
  • Est. Priority Date: 12/16/2011
  • Status: Active Grant
First Claim
Patent Images

1. A data processing method, comprising:

  • sending, by a master node, global initial statistical information to a plurality of slave nodes, wherein the global initial statistical information comprises;

    text subset information divided in advance according to a text set, preset initial hyper-parameter information of a hierarchical Latent Dirichlet Allocation model, a pre-established nested Chinese restaurant process prior of the text set, hierarchical topic path information of a document, document-topic count matrix information, and topic-word count matrix information;

    receiving local statistical information from each of the plurality of slave nodes;

    merging the received local statistical information of each slave node, to obtain new global statistical information, wherein the local statistical information comprises;

    a document-topic count matrix, a topic-word count matrix and a document hierarchical topic path of each slave node, and the new global statistical information comprises;

    global text-topic count matrix information, topic-word count matrix information, topic-word count matrix information of each slave node, and a global document hierarchical topic path;

    after judging that a Gibbs sampling performed by a slave node has ended, calculating a probability distribution between the document and a topic and a probability distribution between the topic and a word according to the new global statistical information, wherein the Gibbs sampling is used to allocate a topic for each word of each document, and allocate a hierarchical topic path for each document;

    according to the probability distributions obtained through calculation, establishing a likelihood function of the text set, and maximizing the likelihood function, to obtain a new hierarchical Latent Dirichlet Allocation model hyper-parameter; and

    after judging that an iteration of solving for a hierarchical Latent Dirichlet Allocation model hyper-parameter has converged, and according to the new hierarchical Latent Dirichlet Allocation model hyper-parameter, calculating and outputting the probability distribution between the document and topic and the probability distribution between the topic and word.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×