IDENTIFICATION OF TOPICS FOR ONLINE DISCUSSIONS BASED ON LANGUAGE PATTERNS
First Claim
1. A method in a computing device for identifying keywords from a corpus of sentences of words, the method comprising:
- storing an initial set of keywords;
identifying, from sentences of the corpus, patterns of words adjacent to the keywords;
identifying, from the sentences of the corpus, a new set of keywords based on the identified patterns; and
repeating the identifying of patterns and keywords until a termination criterion is satisfied.
2 Assignments
0 Petitions
Accused Products
Abstract
A topic identification system identifies topics of online discussions by iteratively identifying topic words or keywords of the online discussions and identifying language patterns associated with those keywords. The topic identification system starts out with an initial set of keywords and identifies language patterns that each include a keyword. The topic identification system then uses the identified language patterns to identify additional keywords of the online discussion that match the patterns. The topic identification system then again identifies language patterns using the keywords including the newly identified keywords. The topic identification system may repeat the process of identifying language patterns and keywords until a termination criterion is satisfied.
-
Citations
20 Claims
-
1. A method in a computing device for identifying keywords from a corpus of sentences of words, the method comprising:
-
storing an initial set of keywords; identifying, from sentences of the corpus, patterns of words adjacent to the keywords; identifying, from the sentences of the corpus, a new set of keywords based on the identified patterns; and repeating the identifying of patterns and keywords until a termination criterion is satisfied. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computing device that identifies keywords from a corpus of sentences of words from online discussions, comprising:
-
a keyword store containing keywords, the keyword store having an initial set of keywords; a corpus store containing sentences of the corpus; a component that identifies sequence segments of the sentences of the corpus, a sequence segment being a sequence of words that includes a keyword of the keyword store; a component that identifies, from the identified sequence segments, patterns of sequences of words that include a keyword; a component that identifies, from the sentences of the corpus, keywords within the identified patterns and adds identified keywords to the keyword store; and a component that determines whether a termination criterion is satisfied so that the iterative identification of sequence segments, patterns, and keywords is terminated. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A computer-readable medium containing instructions for controlling a computing device to identify topic information from a corpus of sentences of words, by a method comprising:
-
storing an initial set of keywords; and repeating the steps of identifying sequence segments of the sentences of the corpus, a sequence segment being a sequence of words that includes a keyword of the keyword store; identifying from the identified sequence segments patterns of sequences of words that include an identified keyword and that satisfy a pattern support criterion; and identifying from the sentences of the corpus keywords within the identified patterns that satisfy a keyword confidence criterion; until a termination criterion is satisfied. - View Dependent Claims (19, 20)
-
Specification