Topic Word Generation Method and System
First Claim
1. A method of generating topic words from at least one seed word and a collection of documents comprising the steps of:
- a. identifying keywords in each document that are indicative of the topic of the document;
b. evaluating the relevance of the keywords from each of the documents to the at least one seed word;
c. identifying at least one key topic document that is relevant to the at least one seed word;
d. selecting a subset of the documents, referred to as topic documents, by an iterative process starting with the selection of the at least one key topic document and then selecting other documents if their keywords are sufficiently similar to the keywords contained in the previously selected topic documents; and
e. extracting a set of topic words from the topic documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of, and system for, extracting topic words from a collection of documents across multiple and potentially very large number of domains. Documents are selected and ranked based on similarity with at least one seed word, which defines a topic. Seed words may be entered directly by a user or provided by another application. Keywords are extracted from documents determined to be a sufficiently good match to the topic and may be displayed to the user or used as input into word prediction or word analysis and display software. Documents are determined to be a sufficiently good match to the topic using an iterative algorithm starting with the best match and selecting documents containing keywords sufficiently similar to the previously selected documents.
-
Citations
52 Claims
-
1. A method of generating topic words from at least one seed word and a collection of documents comprising the steps of:
-
a. identifying keywords in each document that are indicative of the topic of the document; b. evaluating the relevance of the keywords from each of the documents to the at least one seed word; c. identifying at least one key topic document that is relevant to the at least one seed word; d. selecting a subset of the documents, referred to as topic documents, by an iterative process starting with the selection of the at least one key topic document and then selecting other documents if their keywords are sufficiently similar to the keywords contained in the previously selected topic documents; and e. extracting a set of topic words from the topic documents. - View Dependent Claims (2, 3, 5, 6, 7, 10, 11, 12, 13, 19, 21, 48, 49, 51)
-
-
4. (canceled)
-
8-9. -9. (canceled)
-
14-18. -18. (canceled)
-
20. (canceled)
-
22-23. -23. (canceled)
-
24. A system for extracting topic words from documents based on at least one seed word comprising a series of modules including:
-
a. a keyword identification module for identifying keywords in each document that are indicative of the topic of the document; b. an evaluation module for evaluating the relevance of each of the documents to the at least one seed word; c. a key topic document identification module for identifying at least one key topic document that is relevant to the at least one seed word; d. a selection module for selecting a subset of the documents, referred to as topic documents, by an iterative process starting with the at least one key topic document and then selecting other documents if their keywords are sufficiently similar to the keywords contained in the previously selected topic documents; and e. an extraction module for extracting a set of topic words from the topic documents. - View Dependent Claims (25, 26, 28, 29, 30, 33, 34, 35, 36, 42, 44, 52)
-
-
27. (canceled)
-
31-32. -32. (canceled)
-
37-41. -41. (canceled)
-
43. (canceled)
-
45-47. -47. (canceled)
-
50. (canceled)
Specification