Document retrieval-assisting method and system for the same and document retrieval service using the same with document frequency and term frequency
First Claim
1. A method for assisting document retrieval, comprising,detecting documents each including one or a plurality of keywords corresponding to a query as documents of retrieval results from a retrieval subject document group,detecting the document frequency of a word, representing the number of documents where the word appears in the group of the documents of retrieval results,detecting the total document frequency of a word representing the number of documents where the word appears in the whole retrieval subject document group,introducing a frequency ratio representing the ratio of the document frequency of a word to the total document frequency of the word,classifying the document frequency in a set of frequency classes as based on a given relation and assigning each word to a corresponding frequency class, depending on the document frequency of the word,extracting an appropriate number of words from each of the frequency classes in the decreasing order of frequency ratio of word as topic words, anddisplaying the extracted topic words in the form of a graph or a list.
1 Assignment
0 Petitions
Accused Products
Abstract
Because the whole image of a retrieved document group cannot be viewed, next retrieval request after one retrieval attempt has been determined only perceptually. Therefore, displaying a means for displaying topic words on a display means, extracting a group of words characteristically appearing in a retrieved document group on request from a user, further examining the relation between topic words, and preparing a graph using the topic words as nodes, the whole image of the retrieval results is displayed on the means for displaying topic words. Additionally, by selecting an interesting word or a word with no interest on a displayed graph of topic words, the user can design a subsequent retrieval strategy effectively.
-
Citations
24 Claims
-
1. A method for assisting document retrieval, comprising,
detecting documents each including one or a plurality of keywords corresponding to a query as documents of retrieval results from a retrieval subject document group, detecting the document frequency of a word, representing the number of documents where the word appears in the group of the documents of retrieval results, detecting the total document frequency of a word representing the number of documents where the word appears in the whole retrieval subject document group, introducing a frequency ratio representing the ratio of the document frequency of a word to the total document frequency of the word, classifying the document frequency in a set of frequency classes as based on a given relation and assigning each word to a corresponding frequency class, depending on the document frequency of the word, extracting an appropriate number of words from each of the frequency classes in the decreasing order of frequency ratio of word as topic words, and displaying the extracted topic words in the form of a graph or a list.
-
3. A method for document retrieval service, comprising,
detecting documents each including a keyword transmitted from a retriever as documents of retrieval results from a retrieval subject document group, detecting the document frequency of a word, representing the number of documents where the word appears in the group of documents as retrieval results, detecting the total document frequency of a word representing the number of documents where the word appears in the whole retrieval subject document group, introducing a frequency ratio representing the ratio of the document frequency of a word to the total document frequency of the word, classifying the document frequency in a set of frequency classes based on a given relation and assigning each word to a corresponding frequency class, depending on the document frequency of the word, extracting an appropriate number of words from each of the frequency classes in the decreasing order of frequency ratio of word as topic words, composing the extracted topic words as a data displayable in the form of a list per frequency class or in the form of a graph representing the relation between topic words, and transmitting the topic words and said composed data for displaying to the retriever.
-
6. A machine readable data storing media on which the word frequency data for selecting topic words are recorded, wherein the frequency data of each word comprises (a) character sequence, (b) the document frequency of the word, representing the number of documents where the word appears in the group of documents of retrieved results, (c) the total document frequency of the word representing the number of documents where the word appears in the whole retrieval subject document group, (d) the frequency ratio representing the ratio of the document frequency of the word to the total document frequency of the word, and (e) the frequency class of the word assigned to the word depending on its document frequency, and wherein topic words are extracted from each of the frequency classes in the decreasing order of frequency ratio of word.
-
7. A machine readable data storing media on which the co-occurrence data for calculating the relatedness among topic words are recorded, wherein the co-occurrence data of each pair of topic words comprises (a) the co-occurrence frequency of the word pair, that is, the number of documents in the retrieved document set where both words of the word pair appear, and (b) the co-occurrence intensity of the word pair, such as their co-occurrence frequency divided by document frequency of the second word of the word pair, and wherein the links of the graphical display of topic words are generated for word pairs with strong relation.
-
8. A machine readable data storing media on which the coordinate data for displaying a graphical display of topic words are recorded, wherein the data comprises data for displaying nodes of the topic word graph and data for displaying links of the graph representing strong relation between topic words, and the data for displaying nodes comprises the coordinate center, character sequence, and the character number in the crosswise and lengthwise directions of a region displaying the characters, and the size of the displaying region, and the data for displaying links comprises the initiation coordinate and the termination coordinate of each link, and wherein the graphic display of a word graph are displayed following the data.
-
9. A machine readable data storing media on which are recorded the data of each word for selecting topic words, data of each pair of topic words for calculating the relatedness among topic words, and data of each topic word for displaying a graphical display of topic words, wherein the data of each word comprises (a) character sequence, (b) the document frequency of the word, representing the number of documents where the word appears in the group of documents of retrieved results, (c) the total document frequency of the word representing the number of documents where the word appears in the whole retrieval subject document group, (d) the frequency ratio representing the ratio of the document frequency of the word to the total document frequency of the word, and (e) the frequency class of the word assigned to the word depending on its document frequency, and topic words are extracted from each of the frequency classes in the decreasing order of frequency ratio of word, wherein the co-occurrence data of each pair of topic words comprises(a) the co-occurrence frequency of the word pair, that is, the number of documents in the retrieved document set where both words of the word pair appear, and (b) the co-occurrence intensity of the word pair, such as their co-occurrence frequency divided by document frequency of the second word of the word pair, and the links of the graphical display of topic words are generated for word pairs with strong relation, wherein the data comprises data for displaying nodes of the topic word graph and data for displaying links of the graph representing strong relation between topic words, and the data for displaying nodes comprises the coordinate center, character sequence, and the character number in the crosswise and lengthwise directions of a region displaying the characters, and the size of the displaying region, and the data for displaying links comprises the initiation coordinate and the termination coordinate of each link, and the graphic display of a word graph can be displayed following the data.
-
10. A document retrieval system comprising
a means for detecting documents each including one or a plurality of keywords corresponding to a query as documents of retrieval results from a retrieval subject document group, a means for detecting the document frequency of a word, representing the number of documents where the word appears in the group of the documents of retrieval results, a means for detecting the total document frequency of a word representing the number of documents where the word appears in the whole retrieval subject document group, a means for introducing a frequency ratio representing the ratio of the document frequency of a word to the total document frequency of the word, a means for classifying the document frequency in a set of frequency classes as based on a given relation and assigning each word to a corresponding frequency class, depending on the document frequency of the word, a means for extracting an appropriate number of words from each of the frequency classes in the decreasing order of frequency ratio of word as topic words, and a means for displaying the extracted topic words in the form of a graph or a list.
Specification