Document processing apparatus, word extracting apparatus, word extracting method and storage medium for storing word extracting program
First Claim
1. A document processing apparatus comprising:
- a document information storing element for storing information including a document identifier and a plurality of words included in a document for each of all documents;
a retrieval condition inputting element for inputting a retrieval condition for the documents to be retrieved;
a retrieving element for retrieving specific documents matching the retrieval condition by using the information;
a keyword designating element for designating an arbitrary word in the documents as an associate-word-searching word and designating other words as candidates to be associated;
a simultaneous appearance probability calculating element for calculating a probability that the associate-word-searching word and one of the candidates are simultaneously included in any of the documents retrieved by the retrieving element for each of the candidates;
a first independent appearance probability calculating element for calculating a first probability that the associate-word-searching word is included in any of all documents;
a second independent appearance probability calculating element for calculating a second probability that one of the candidates is included in any of all documents for each of the candidates;
a calculating element for calculating the sum of product of the first and second probabilities for each of the candidates; and
an associate word extracting element for calculating a ratio of the probability calculated by the simultaneous appearance probability calculating element to the sum or product calculated by the calculating element for each of the candidates and extracting a word according to the ratio of each of the candidates.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a document processing apparatus, word extracting apparatus, word extracting method and storage medium for storing a word extracting program, capable of appropriately presenting effective associate words to the user. A retrieving element executes retrieval of documents based on a retrieval condition inputted through a retrieval condition inputting element. A keyword designating element designates an arbitrary word among the words included in the retrieved documents as an associate-word-searching word and designates other words as candidates for an associate word. A simultaneous appearance probability calculating element calculates a simultaneous appearance probability of the associate-word-searching word and one of the candidates for the associate word in any of the retrieved documents. A first independent appearance probability calculating element obtains an independent appearance probability of the associate-word-searching word in each of all documents. A second independent appearance probability calculating element calculates an independent appearance probability of each of the candidates for the associate word in each of all documents. A calculating element calculates the sum or product of the independent appearance probability of the associate-word-searching word and the independent appearance probability of each of the candidates for the associate word. An associate word extracting element extracts a word according to the ratio of the simultaneous appearance probability to the sum or product calculated by the calculating element.
-
Citations
7 Claims
-
1. A document processing apparatus comprising:
-
a document information storing element for storing information including a document identifier and a plurality of words included in a document for each of all documents; a retrieval condition inputting element for inputting a retrieval condition for the documents to be retrieved; a retrieving element for retrieving specific documents matching the retrieval condition by using the information; a keyword designating element for designating an arbitrary word in the documents as an associate-word-searching word and designating other words as candidates to be associated; a simultaneous appearance probability calculating element for calculating a probability that the associate-word-searching word and one of the candidates are simultaneously included in any of the documents retrieved by the retrieving element for each of the candidates; a first independent appearance probability calculating element for calculating a first probability that the associate-word-searching word is included in any of all documents; a second independent appearance probability calculating element for calculating a second probability that one of the candidates is included in any of all documents for each of the candidates; a calculating element for calculating the sum of product of the first and second probabilities for each of the candidates; and an associate word extracting element for calculating a ratio of the probability calculated by the simultaneous appearance probability calculating element to the sum or product calculated by the calculating element for each of the candidates and extracting a word according to the ratio of each of the candidates.
-
-
2. A word extracting apparatus comprising:
-
an item information storing element for storing information including an item identifier and a plurality of words included in an item for each of all items; a retrieval condition inputting element for inputting a retrieval condition for the items to be retrieved; a retrieving element for retrieving specific items matching the retrieval condition by using the information; a keyword designating element for designating an arbitrary word in the items as an associate-word-searching word and designating other words as candidates to be associated; a simultaneous appearance probability calculating element for calculating a probability that the associate-word-searching word and one of the candidates are simultaneously included in any of the items retrieved by the retrieving element for each of the candidates; a first independent appearance probability calculating element for calculating a first probability that the associate-word-searching word is included in any of all items; a second independent appearance probability calculating element for calculating a second probability that one of the candidates is included in any of all items for each of the candidates; a calculating element for calculating the sum or product of the first and second probabilities for each of the candidates; and an associate word extracting element for calculating a ratio of the probability calculated by the simultaneous appearance probability calculating element to the sum or product calculated by the calculating element for each of the candidates and extracting a word according to the ratio of each of the candidates.
-
-
3. A word extracting apparatus comprising:
-
an item information storing element for storing information including an item identifier and a plurality of words included in an item for each of all items; a retrieval condition inputting element for inputting a retrieval condition for the items to be retrieved; a retrieving element for retrieving specific items matching the retrieval condition by using the information; a keyword designating element for designating an arbitrary word in the items as an associate-word-searching word and designating other words as candidates to be associated; a simultaneous appearance probability calculating element for calculating a probability that the associate-word-searching word and one of the candidates are simultaneously included in any of the items retrieved by the retrieving element for each of the candidates; a first independent appearance probability calculating element for calculating a first probability that the associate-word-searching word is included in any of all items; a second independent appearance probability calculating element for calculating a second probability that one of the candidates is included in any of all items for each of the candidates; a calculating element for calculating the sum or product of the first and second probabilities for each of the candidates; and an associate word extracting element for calculating a statistical value using the probability calculated by the simultaneous appearance probability calculating element and the sum or product calculated by the calculating element for each of the candidates and extracting a word according to the statistical values of each of the candidates. - View Dependent Claims (4, 5)
-
-
6. A word extracting method for an information retrieving apparatus which comprises an item information storing element for storing information including an item identifier and a plurality of words included in an item for each of all items, comprising the steps of:
-
inputting a retrieval condition for the item; retrieving specific items matching the retrieval condition from the item information storing element; designating an arbitrary word in the retrieved items as an associate-word-searching word and designating other words as candidates to be associated; calculating a simultaneous probability that the associate-word-searching word and one of the candidates are simultaneously included in any of the retrieved items for each of the candidates; calculating a first probability that the associate-word-searching word is included in any of all items; calculating a second probability that one of the candidates is included in any of all items for each of the candidates; calculating the sum or product of the first and second probabilities for each of the candidates; and calculating a statistical value using the simultaneous probability and the sum or product for each of the candidates and extracting a word according to the statistical value of each of the candidates.
-
-
7. A storage medium readable by a computer, storing a program of instructions executable by the computer to perform a method for extracting a word, the method comprising the steps of:
-
storing information including an item identifier and a plurality of words included in an item for each of all items; inputting a retrieval condition for the items; retrieving specific items matching the retrieval condition from the item information storing element; designating an arbitrary word in the retrieved items as an associate-word-searching word and designating other words as candidates to be associated; calculating a simultaneous probability that the associate-word-searching word and one of the candidates are simultaneously included in any of the retrieved items for each of the candidates; calculating a first probability that the associate-word-searching word is included in any of all items; calculating a second probability that one of the candidates is included in any of all items for each of the candidates; calculating the sum or product of the first and second probabilities for each of the candidates; and calculating statistical value using the simultaneous probability and the sum or product for each of the candidates and extracting a word according to the statistical value of each of the candidates.
-
Specification