×

Topic identification and use thereof in information retrieval systems

  • US 7,340,466 B2
  • Filed: 02/26/2002
  • Issued: 03/04/2008
  • Est. Priority Date: 02/26/2002
  • Status: Active Grant
First Claim
Patent Images

1. A method to identify topics in a data corpus having a plurality of segments, comprising:

  • determining a segment-level actual usage value for one or more word combinations, wherein a word combination includes two or more substantially contiguous words, wherein two words are substantially contiguous if they are separated by zero words or words selected from a predetermined list of words;

    computing a segment-level expected usage value for each of the one or more word combinations in accordance with S(wi)xS(wj) x . . . x S(wm)/Nm−

    1
    where “

    m”

    represents the number of words in the word combination, “

    N”

    represents the number of segments in the data corpus, and S(w) represents the number of unique segments in the data corpus that word wi of the word combination is in;

    designating a word combination as a topic if the segment level actual usage value of the word combination is greater than the segment-level expected usage value of the word combination; and

    storing the topic on a computer readable storage medium.

View all claims
  • 15 Assignments
Timeline View
Assignment View
    ×
    ×