Word importance calculation method, document retrieving interface, word dictionary making method
First Claim
1. A word importance calculation method for calculating the importance of words contained in a document set, whereby the difference between the word distribution in a subset of every document containing a specified word and the word distribution in a set of whole documents including said subset is used to calculate the importance of the word.
1 Assignment
0 Petitions
Accused Products
Abstract
A known method for selecting words (or word sequences), which is an important aspect of information retrieval, involves the problems of inability to eliminate high-frequency common words and of often arbitrary setting of the threshold value for dividing important and unimportant words. These problems are solved by normalizing the difference between the word distribution in a subset of all documents containing a word to be extracted (or a subset of said document set) and the word distribution in the set of all documents with the number of words in the said subset of all documents containing the word as a parameter, and the accuracy of support information retrieval is thereby enhanced.
52 Citations
8 Claims
- 1. A word importance calculation method for calculating the importance of words contained in a document set, whereby the difference between the word distribution in a subset of every document containing a specified word and the word distribution in a set of whole documents including said subset is used to calculate the importance of the word.
-
6. A document retrieval interface having a function to display on a screen words characterizing a document set, wherein the importance of each word occurring in a set of whole documents is calculated using the difference between the word distribution in the subset of every document containing the word and the word distribution in the set of whole documents including said subset, and the importance is brought to bear on the selection, arrangement or coloring of the words displayed on the screen.
-
7. A document retrieval interface having a function to display on a screen words characterizing a document set, wherein the importance of each word occurring in the document set obtained as a result of retrieval is calculated using the difference between the word distribution in the subset of documents out of the document set obtained as a result of that retrieval containing that word and the word distribution in the document set obtained as a result of that retrieval, and the importance is brought to bear on the selection, arrangement or coloring of the words displayed on the screen.
-
8. A word dictionary construction method by extracting important words from a document set in accordance with rules given in advance, wherein the importance of each word occurring in a set of whole documents is calculated using the difference between a subset of every document containing the word and the word distribution in the set of whole documents including said subset, and words to be extracted are selected on the basis of that importance.
Specification