APPARATUS FOR CLUSTERING A PLURALITY OF DOCUMENTS
First Claim
1. An apparatus comprising:
- a selection section for selecting a plurality of sample documents from a plurality of documents;
a first parameter generation section for analyzing the plurality of sample documents to generate an initial parameter matrix expressing a probability that each of a plurality of words included in the plurality of sample documents is included in each of a plurality of topics; and
a second parameter generation section for analyzing the plurality of documents by using each value included in the initial parameter matrix as an initial value to generate a parameter matrix expressing a probability that each of a plurality of words included in the plurality of documents is included in each of a plurality of topics.
1 Assignment
0 Petitions
Accused Products
Abstract
According to an aspect, there are provided an apparatus, a program for causing a computer to function as such an apparatus, and a method, wherein the apparatus includes a selection section for selecting a plurality of sample documents from a plurality of documents and a first parameter generation section for analyzing the plurality of sample documents to generate an initial parameter matrix expressing a probability that each of a plurality of words included in the plurality of sample documents is included in each of a plurality of topics. The apparatus also includes a second parameter generation section for analyzing the plurality of documents by using each value included in the initial parameter matrix as an initial value to generate a parameter matrix expressing a probability that each of a plurality of words included in the plurality of documents is included in each of a plurality of topics.
33 Citations
20 Claims
-
1. An apparatus comprising:
-
a selection section for selecting a plurality of sample documents from a plurality of documents; a first parameter generation section for analyzing the plurality of sample documents to generate an initial parameter matrix expressing a probability that each of a plurality of words included in the plurality of sample documents is included in each of a plurality of topics; and a second parameter generation section for analyzing the plurality of documents by using each value included in the initial parameter matrix as an initial value to generate a parameter matrix expressing a probability that each of a plurality of words included in the plurality of documents is included in each of a plurality of topics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An apparatus for clustering a plurality of documents, comprising:
-
a selection section for selecting a plurality of sample documents from the plurality of documents; a parameter generation section for analyzing the plurality of sample documents to generate a parameter matrix expressing a probability that each of a plurality of words included in the plurality of sample documents is included in each of a plurality of topics; and a clustering section for clustering the plurality of documents into a plurality of clusters based on the parameter matrix.
-
-
14. A method comprising:
-
selecting a plurality of sample documents from a plurality of documents; analyzing the plurality of sample documents to generate an initial parameter matrix expressing a probability that each of a plurality of words included in the plurality of sample documents is included in each of a plurality of topics; and analyzing the plurality of documents by using each value included in the initial parameter matrix as an initial value to generate a parameter matrix expressing a probability that each of a plurality of words included in the plurality of documents is included in each of a plurality of topics.
-
-
15. A computer program product for clustering a plurality of documents, the computer program product comprising:
a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising; selecting a plurality of sample documents from a plurality of documents; analyzing the plurality of sample documents to generate an initial parameter matrix expressing a probability that each of a plurality of words included in the plurality of sample documents is included in each of a plurality of topics; and analyzing the plurality of documents by using each value included in the initial parameter matrix as an initial value to generate a parameter matrix expressing a probability that each of a plurality of words included in the plurality of documents is included in each of a plurality of topics. - View Dependent Claims (16, 17, 18, 19, 20)
Specification