CROSS-LANGUAGE TEXT CLUSTERING
First Claim
1. A method for a computing device to analyze, across languages, a set of texts in one or more natural languages, the method comprising for each text:
- electronically analyzing the text, wherein the analysis includes performing steps including;
performing a syntactic analysis of at least one sentence of the text; and
creating a language-independent semantic structure (LISS) by performing a semantic analysis of the sentence of the text;
generating a set of features for the text, where at least one feature is based on the results of the said analysis; and
performing text clustering based on said set of features, wherein the text clustering includes assigning the text to one or more clusters.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods are described for performing clustering or classification of texts of different languages. Language-independent semantic structures (LISS) are constructed before clustering is performed. These structures reflect lexical, morphological, syntactic, and semantic properties of texts. The methods suggested are able to perform cross-language text clustering which is based on the meaning derived from texts. The methods are applicable to genre classification, topic detection, news analysis, authorship analysis, internet searches, and creating corpora for other tasks, etc.
-
Citations
25 Claims
-
1. A method for a computing device to analyze, across languages, a set of texts in one or more natural languages, the method comprising for each text:
-
electronically analyzing the text, wherein the analysis includes performing steps including; performing a syntactic analysis of at least one sentence of the text; and creating a language-independent semantic structure (LISS) by performing a semantic analysis of the sentence of the text; generating a set of features for the text, where at least one feature is based on the results of the said analysis; and performing text clustering based on said set of features, wherein the text clustering includes assigning the text to one or more clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
Specification