Please download the dossier by clicking on the dossier button x
×

Integrating and extracting topics from content of heterogeneous sources

  • US 9,176,969 B2
  • Filed: 08/29/2013
  • Issued: 11/03/2015
  • Est. Priority Date: 08/29/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for integrating and extracting topics from content of heterogeneous sources, the system comprising:

  • a processor to;

    identify a plurality of observed words in documents that are received from the heterogeneous sources;

    obtain document metadata and source metadata from the heterogeneous sources;

    use the document metadata to calculate a plurality of word topic probabilities for the plurality of observed words;

    use the source metadata to calculate a plurality of source topic probabilities for the plurality of observed words; and

    determine a latent topic for one of the documents based on the plurality of observed words, the plurality of word topic probabilities, and the plurality of source topic probabilities, wherein the latent topic is determined using a Discriminative Dirichlet Allocation (DDA) modeling technique comprising;

    in response to determining that a number of occurrences of related observed words assigned to the latent topic has reached a dynamic threshold, adjusting a word topic probability based on pre-determined user-defined features; and

    adjusting the word topic probability of an observed word based on a source topic probability of the source topic probabilities associated with the observed word, wherein the adjusting the word topic probability of the observed word comprises using Gibbs sampling to apply a bicriterion that maximizes the plurality of word topic probabilities and uses the dynamic threshold to monitor the number of occurrences of the related observed words.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×