Information processing apparatus , method, and computer-readable recording medium for performing full text retrieval of documents
First Claim
Patent Images
1. An information processing apparatus for creating a retrieval result displaying a list of retrieval documents, comprising:
- a computer memory that stores a feature word file database configured to register, for each of a plurality of stored documents, document identification identifying the document, feature words extracted from full text data of the document, and weight values indicating weights of the feature words in which the feature words and the weight values are corresponded to the document identification;
a computer processor;
a document retrieval part, executable by the computer processor, configured to retrieve the retrieval documents, from among the plurality of stored documents, corresponding to a retrieval condition by conducting a full text retrieval of documents;
a document scoring part, executable by the computer processor, configured to order the retrieval documents by scores indicating degrees of relevance to the retrieval condition;
a document grouping part, executable by the computer processor, configured to group the retrieval documents into a plurality of groups based on an average rate of change of all the scores such that the groups are divided at a point where a difference in respective scores between two retrieval documents is greater than the average rate of change of all the scores; and
a document clustering part, executable by the computer processor, configured to conduct a clustering process with respect to the retrieval documents based on the feature words and the weight values of the feature words acquired from the feature word file database, by using the document identifications of the retrieval documents as keys,wherein the average rate of change of the scores indicates a clustering accuracy, and the document clustering part conducts the clustering process with respect to the retrieval documents in a group, for each of the plurality of groups to which the retrieval documents are grouped by the document grouping part;
wherein the feature words are extracted based on first values indicating appearance frequencies of words obtained from the full text data and second values indicating appearance frequencies of morpheme occurrences obtained when a morphological analysis is conducted, wherein a morpheme is a smallest semantically meaningful unit in a language, and the morphological analysis analyzing behavior and combination of morphemes.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing apparatus for creating a retrieval result displaying a list of retrieval documents. Retrieval documents corresponding to a retrieval condition are classified into groups based on scores indicating degrees of relevance to the retrieval condition. A clustering process is conducted with respect to the retrieval documents in a group, for each of groups to which the retrieval documents belong.
-
Citations
6 Claims
-
1. An information processing apparatus for creating a retrieval result displaying a list of retrieval documents, comprising:
-
a computer memory that stores a feature word file database configured to register, for each of a plurality of stored documents, document identification identifying the document, feature words extracted from full text data of the document, and weight values indicating weights of the feature words in which the feature words and the weight values are corresponded to the document identification; a computer processor; a document retrieval part, executable by the computer processor, configured to retrieve the retrieval documents, from among the plurality of stored documents, corresponding to a retrieval condition by conducting a full text retrieval of documents; a document scoring part, executable by the computer processor, configured to order the retrieval documents by scores indicating degrees of relevance to the retrieval condition; a document grouping part, executable by the computer processor, configured to group the retrieval documents into a plurality of groups based on an average rate of change of all the scores such that the groups are divided at a point where a difference in respective scores between two retrieval documents is greater than the average rate of change of all the scores; and a document clustering part, executable by the computer processor, configured to conduct a clustering process with respect to the retrieval documents based on the feature words and the weight values of the feature words acquired from the feature word file database, by using the document identifications of the retrieval documents as keys, wherein the average rate of change of the scores indicates a clustering accuracy, and the document clustering part conducts the clustering process with respect to the retrieval documents in a group, for each of the plurality of groups to which the retrieval documents are grouped by the document grouping part; wherein the feature words are extracted based on first values indicating appearance frequencies of words obtained from the full text data and second values indicating appearance frequencies of morpheme occurrences obtained when a morphological analysis is conducted, wherein a morpheme is a smallest semantically meaningful unit in a language, and the morphological analysis analyzing behavior and combination of morphemes. - View Dependent Claims (2, 3, 4)
-
-
5. A full text retrieval method in an information processing apparatus for creating a retrieval result displaying a list of retrieval documents, the method executable by a computer processor and comprising steps of:
-
retrieving, by a document retrieval part, the retrieval documents, from among a plurality of stored documents, corresponding to a retrieval condition by conducting a full text retrieval of documents; ordering, by a document scoring part the retrieval documents by scores indicating degrees of relevance to the retrieval condition; grouping, by a document grouping part, the retrieval documents based on the scores into a plurality of groups based on an average rate of change of all the scores such that the groups are divided at a point where a difference in scores between two retrieval documents is greater than the average rate of change of all the scores, wherein the average rate of change of the scores indicates a clustering accuracy; and conducting a clustering process, by a document clustering part, with respect to the retrieval documents based on feature words of the documents extracted from full text data of the documents and weight values indicating weights of the feature words, wherein the document clustering part conducts the clustering process with respect to the retrieval documents in a group, for each of the groups to which the retrieval documents are grouped by the document grouping part; wherein the feature words are extracted based on first values indicating appearance frequencies of words obtained from the full text data and second values indicating appearance frequencies of morpheme occurrences obtained when a morphological analysis is conducted, wherein a morpheme is a smallest semantically meaningful unit in a language and the morphological analysis analyzing behavior and combination of morphemes.
-
-
6. A non-transitory computer-readable recording medium recorded thereon a computer program for causing an information processing apparatus to perform a full text retrieval method for creating a retrieval result displaying a list of retrieval documents, the method comprising:
-
retrieving, by a document retrieval part, the retrieval documents, from among a plurality of stored documents, corresponding to a retrieval condition by conducting a full text retrieval of documents; ordering, by a document scoring part, the retrieval documents by scores indicating degrees of relevance to the retrieval condition; grouping, by a document grouping part, the retrieval documents into a plurality of groups based on an average rate of change of all the scores such that the groups are divided at a point where a difference in respective scores between two retrieval documents is greater than the average rate of change of all the scores; and conducting a clustering process, by a document clustering part, with respect to the retrieval documents based on feature words of the documents extracted from full text data of the documents and weight values indicating weights of the feature words, wherein the average rate of change of the scores indicates a clustering accuracy, and the document clustering part conducts the clustering process with respect to the retrieval documents in a group, for each of the plurality of groups to which the retrieval documents are grouped by the document grouping part; wherein the feature words are extracted based on first values indicating appearance frequencies of words obtained from the full text data and second values indicating appearance frequencies of morpheme occurrences obtained when a morphological analysis is conducted, wherein a morpheme is a smallest semantically meaningful unit in a language, and the morphological analysis analyzing behavior and combination of morphemes.
-
Specification