×

Clustering of Text for Structuring of Text Documents and Training of Language Models

  • US 20070244690A1
  • Filed: 11/11/2004
  • Published: 10/18/2007
  • Est. Priority Date: 11/21/2003
  • Status: Abandoned Application
First Claim
Patent Images

1. A method of text clustering for the generation of language models, a text (300) featuring a plurality of text units (320, 322, . . . ), each of which having at least one word (302, 304, . . . ), the method of text clustering comprising the steps of:

  • assigning each of the text units (320, 322, . . . ) to one of a plurality of provided clusters (330, 332, . . . ), determining for each text unit a set of emission probabilities (340, 350), each emission probability (342, 344, . . . , 352, 354, . . . ) being indicative of a correlation between the text unit (320, 322, . . . ) and a cluster (330, 332, . . . ), the set of emission probabilities being indicative of the correlations between the text unit and the plurality of clusters, determining a transition probability (362, 364, . . . ) being indicative that a first cluster (330) being assigned to a first text unit (320) in the text is followed by a second cluster (332) being assigned to a second text unit (322) in the text, the second text unit (322) subsequently following the first text unit (320) within the text, performing an optimization procedure based on the emission probability and the transition probability in order to assign each text unit to a cluster.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×