×

Mining multilingual topics

  • US 8,825,648 B2
  • Filed: 04/15/2010
  • Issued: 09/02/2014
  • Est. Priority Date: 04/15/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • under a control of one or more processors,identifying multiple concept-units from a multi-language document corpus, a respective concept-unit including a set of documents in a plurality of languages describing a particular concept, the identifying including identifying one or more hyperlinks or references within a respective document that identify one or more other documents in one or more other languages relating to the particular concept; and

    modeling the concept-units of the multi-language document corpus by maintaining a separation of term-by-document matrices for the plurality of languages to create a generative model, the generative model representing;

    a plurality of universal topics, at least one respective universal topic being defined by a plurality of topic word distributions in the plurality of languages, at least one of the plurality of topic word distributions for a respective universal topic corresponding to a respective language from the plurality of languages and including one or more words in the respective language with corresponding probability values characterizing the respective universal topic; and

    a topic distribution for at least one concept-unit, the topic distribution for a respective concept-unit including one or more universal topics and their distributions for the respective concept-unit, the set of documents in the different plurality of languages of the respective concept-unit being constrained to share a common topic distribution.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×