×

Identification of topics for online discussions based on language patterns

  • US 7,739,261 B2
  • Filed: 06/14/2007
  • Issued: 06/15/2010
  • Est. Priority Date: 06/14/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method in a computing device for identifying keywords from a corpus of sentences of words, the method comprising:

  • storing an initial set of keywords as a current set of keywords;

    locating, from sentences of the corpus, words that are keywords of the current set of keywords and replacing each located word with an occurrence of keyword symbol;

    for each occurrence of a keyword symbol of a sentence of the corpus, identifying a sequence segment that includes the occurrence of the keyword symbol along with words of the sentence that are adjacent to the keyword symbol;

    applying a pattern mining algorithm to the identified sequence segments to identify patterns of words adjacent to the occurrences of the keyword symbol by comparing words adjacent to an occurrence of a keyword symbol to words adjacent to other occurrences of the keyword symbol to derive patterns from the adjacent words, some of identified patterns including the keyword symbol and others of the identified patterns not including the keyword symbol;

    filtering out from the identified patterns the identified patterns that do not include the keyword symbol;

    filtering out from the identified patterns the identified patterns that include only prepositions in addition to the keyword symbol;

    identifying, from the sentences of the corpus, a new current set of keywords that satisfy a keyword confidence criterion based on the identified patterns by applying each identified pattern to the sentences and when an identified pattern matches a sentence, designating the word of the sentence corresponding to the keyword symbol of the identified pattern as a keyword of the new current set of keywords; and

    repeating the locating of words, the identifying of sequence segments, the applying of the pattern matching algorithm to identify patterns, and the identifying of keywords using the new current set of keywords until a termination criterion is satisfied and then indicating that the keywords of the identified new current sets of keywords are keywords of the corpus.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×