×

Mining Multilingual Topics

  • US 20110258229A1
  • Filed: 04/15/2010
  • Published: 10/20/2011
  • Est. Priority Date: 04/15/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • identifying multiple concept-units from a multi-language document corpus, a particular concept-unit including a set of documents in different languages describing a particular concept;

    modeling the concept-units of the multi-language document corpus to create a generative model, wherein the generative model represents at least;

    (a) a plurality of universal topics, each of the universal topics being defined by a plurality of topic word distributions corresponding respectively to the different languages;

    (b) a topic distribution for each concept-unit, wherein the documents of any single concept-unit are constrained within the generative model to share a common topic distribution;

    inferring the plurality of universal topics from the documents of the concept-units based on the generative model.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×