×

Cross-language text clustering

  • US 9,495,358 B2
  • Filed: 10/10/2012
  • Issued: 11/15/2016
  • Est. Priority Date: 10/10/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for a computing device to analyze, across languages, a set of texts in one or more natural languages, the method comprising for each text:

  • electronically analyzing the text, wherein electronically analyzing the text comprises;

    performing a syntactic analysis of at least one sentence of the text, the syntactic analysis comprising a rough syntactic analysis to generate a graph of generalized constituents representing all possible variants of parsing the at least one sentence of the text syntactically, the syntactic analysis further comprising a precise syntactic analysis to generate at least one syntactic tree from the graph of generalized constituents, and selecting a preferred one of the at least one syntactic tree; and

    creating a language-independent semantic structure (LISS) by performing a semantic analysis of the preferred one of the at least one syntactic tree, wherein the LISS comprises an acyclic graph where each word in the sentence is represented by a corresponding one of a plurality of semantic classes, and wherein each of the semantic classes is a universal language-independent semantic notion of a respective word;

    generating a set of features for the text based at least in part on the LISS;

    creating at least one index for the text, wherein each value in the index relates to a corresponding one of the set of features and comprises a list of at least one of numbers or addresses of occurrences of the corresponding feature in the text; and

    performing text clustering based on said set of features, wherein performing the text clustering comprises assigning the text to one or more clusters.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×