EXTRACTING TOPICALLY RELATED KEYWORDS FROM RELATED DOCUMENTS
First Claim
1. A computer-implemented process for extracting topically related keywords from topically related documents, comprising:
- using a computer to perform the following process actions;
accessing a set of topically related documents;
identifying a number of candidate keywords from the set of related documents, wherein a candidate keyword can be an individual term or a multiple word phrase;
forming a weighted keyword candidate-document matrix using the candidate keywords;
partitioning the keyword candidate-document matrix into multiple groups of keyword candidates;
identifying dense clusters of keyword candidates in each of the groups of keyword candidates whose density exceeds a prescribed density threshold; and
for each of the identified dense clusters, designating the keyword candidates associated with that cluster as topically related keywords.
2 Assignments
0 Petitions
Accused Products
Abstract
Keyword extraction technique embodiments are presented which extract topically related keywords from a set of topically related documents. In one general embodiment, this keyword extraction involves first accessing a set of topically related documents. A number of candidate keywords are then identified from the set of related documents. A weighted keyword candidate-document matrix is formed using these candidate keywords, and it is partitioned into multiple groups of keyword candidates. Dense clusters of keyword candidates whose density exceeds a prescribed density threshold are then identified in each of the groups of keyword candidates. Finally, the keyword candidates associated with each dense cluster are designated as topically related keywords.
-
Citations
20 Claims
-
1. A computer-implemented process for extracting topically related keywords from topically related documents, comprising:
using a computer to perform the following process actions; accessing a set of topically related documents; identifying a number of candidate keywords from the set of related documents, wherein a candidate keyword can be an individual term or a multiple word phrase; forming a weighted keyword candidate-document matrix using the candidate keywords; partitioning the keyword candidate-document matrix into multiple groups of keyword candidates; identifying dense clusters of keyword candidates in each of the groups of keyword candidates whose density exceeds a prescribed density threshold; and for each of the identified dense clusters, designating the keyword candidates associated with that cluster as topically related keywords. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
20. A system for extracting topically related keywords from topically related documents, comprising:
-
a general purpose computing device; and a computer program comprising program modules executable by the computing device, wherein the computing device is directed by the program modules of the computer program to, access a set of topically related documents, extract a number of candidate keywords from the set of related documents using a controlled vocabulary, wherein the controlled vocabulary comprises a list of keywords believed to be relevant to a topic of interest associated with the set of related document, and wherein only terms and phrases found in the set of related documents that are included in the controlled vocabulary are identified as candidate keywords, form a weighted keyword candidate-document matrix using the candidate keywords, and partition the matrix into multiple groups of keyword candidates using a spectral partitioning technique, identify dense clusters of keyword candidates in each of the groups of keyword candidates whose density exceeds a prescribed density threshold, and for each of the identified dense clusters, designate the keyword candidates associated with that cluster as topically related keywords.
-
Specification