Method for performing effective drill-down operations in text corpus visualization and exploration using language model approaches for key phrase weighting
First Claim
1. A method for performing a drill-down operation on a text corpus comprising documents using language models for key phrase weighting, said method comprising the steps of:
- (a) weighting key phrases occurring both in a foreground language model, which contains a selected document cluster of said text corpus, and in a background language model, which does not contain said selected document cluster, by calculating for each key phrase a key phrase weight comprising a ratio between the foreground weight of said key phrase and a background weight of said key phrase; and
(b) assigning documents of the foreground language model to cluster labels which are formed by key phrases having high calculated key phrase weights.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to a method and an apparatus for performing a drill-down operation on a text corpus comprising documents, using language models for key phrase weighting, said method comprising the steps of weighting key phrases occurring both in a foreground language model, which contains a selected document cluster of said text corpus, and in a background language model, which does not contain said selected document cluster, by calculating for each key phrase a key phrase weight comprising a ratio between the foreground weight of said key phrase and a background weight of said key phrase, and assigning documents of the foreground language model to cluster labels which are formed by key phrases having high calculated key phrase weights.
15 Citations
24 Claims
-
1. A method for performing a drill-down operation on a text corpus comprising documents using language models for key phrase weighting, said method comprising the steps of:
-
(a) weighting key phrases occurring both in a foreground language model, which contains a selected document cluster of said text corpus, and in a background language model, which does not contain said selected document cluster, by calculating for each key phrase a key phrase weight comprising a ratio between the foreground weight of said key phrase and a background weight of said key phrase; and (b) assigning documents of the foreground language model to cluster labels which are formed by key phrases having high calculated key phrase weights. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for performing a drill-down operation on a text corpus comprising documents using language models for key phrase weighting comprising the steps of:
-
(a) clustering said text corpus into clusters each including a set of documents; (b) selecting a cluster from among the clusters to generate a foreground language model containing the selected document cluster and a background language model which does not contain the selected document cluster; (c) weighting key phrases occurring both in the foreground language model and in the background language model by calculating for each key phrase a key phrase weight comprising a ratio between a foreground weight of said key phrase and -a background weight of said key phrase; (d) sorting the weighted key phrases according to the respective key phrase weight in descending order; (e) weighting a configurable number of key phrases having a high key phrase weight as cluster label; and (f) assigning documents of a foreground language model to the selected cluster labels. - View Dependent Claims (17, 18)
-
-
19. A user terminal for performing a drill-down operation on a text corpus comprising documents stored in at least one data base using language models for key phrase weighting, said user terminal comprising:
-
(a) a screen for displaying cluster labels of selectable document clusters each including a set of documents; (b) a calculation unit for weighting key phrases occurring both in a foreground language model, which contains a selected document cluster of said text corpus, and in a background language model, which does not contain said selected document cluster, by calculating for each key phrase a key phrase weight comprising a ratio between a foreground weight of said key phrase and a background weight of said key phrase and for assigning documents of said foreground language model to cluster labels which are formed by key phrases having high calculated key phrase weights. - View Dependent Claims (20, 21, 22)
-
-
23. An apparatus for performing a drill-down operation on a text corpus comprising documents using language models for key phrase weighting, said apparatus comprising:
-
(a) means for weighting a key phrase occurring both in a foreground language model, which contains a selected document cluster of said text corpus, and in a background language model, which does not contain a selected document cluster, by calculating for each key phrase a key phrase weight comprising a ratio between a foreground weight of said key phrase and a background weight of said key phrase; and (b) means for assigning documents of the foreground language model to cluster labels which are formed by key phrases having high calculated key phrase weights.
-
-
24. An apparatus for performing a drill-down operation on a text corpus comprising documents using language models for key phrase weighting,
wherein said apparatus comprises: -
(a) means for clustering said text corpus into clusters each including a set of documents; (b) means for selecting a cluster from among the clusters to generate a foreground language model which contains the selected document cluster and a background language model which does not contain the selected document cluster; (c) means for weighting key phrases occurring both in the foreground language model and in the background language model by calculating for each key phrase a key phrase weight comprising a ratio between a foreground weight of said key phrase and a background weight of said key phrase; (d) means for sorting the weighted key phrases according to the key phrase weight; (e) means for selecting a configurable number of key phrases having the highest key phrase weight as cluster labels; and (f) means for assigning documents of the foreground language model to the selected cluster labels.
-
Specification