MINING MULTILINGUAL TOPICS
First Claim
Patent Images
1. A method comprising:
- identifying multiple concept-units from a multi-language document corpus, a particular concept-unit including a set of documents in different languages describing a particular concept;
modeling the concept-units of the multi-language document corpus to create a generative model, wherein the generative model represents at least;
(a) a plurality of universal topics, each of the universal topics being defined by a plurality of topic word distributions corresponding respectively to the different languages;
(b) a topic distribution for each concept-unit, wherein the documents of any single concept-unit are constrained within the generative model to share a common topic distribution;
inferring the plurality of universal topics from the documents of the concept-units based on the generative model.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for utilizing data mining technology to extract universal topics with multilingual representations from a multilingual database, and to organize existing or new documents in different languages by analyzing their respective topic distributions.
-
Citations
20 Claims
-
1. A method comprising:
-
identifying multiple concept-units from a multi-language document corpus, a particular concept-unit including a set of documents in different languages describing a particular concept; modeling the concept-units of the multi-language document corpus to create a generative model, wherein the generative model represents at least; (a) a plurality of universal topics, each of the universal topics being defined by a plurality of topic word distributions corresponding respectively to the different languages; (b) a topic distribution for each concept-unit, wherein the documents of any single concept-unit are constrained within the generative model to share a common topic distribution; inferring the plurality of universal topics from the documents of the concept-units based on the generative model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 20)
-
-
9. A method comprising:
-
identifying multiple concept-units from a multi-language document corpus, a particular concept-unit including a set of documents in different languages describing a particular concept; and inferring a plurality of universal topics from the multiple concept-units, each of the universal topics being defined by a plurality of topic word distributions corresponding respectively to the different languages. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method comprising:
-
identifying multiple concept-units from a multi-language document corpus, a particular concept-unit including a set of documents in different languages describing a particular concept; and deriving a universal topic space from the concept-units; and analyzing new documents of different languages to place them within the universal topic space. - View Dependent Claims (19)
-
Specification