Topic analyzing method and apparatus and program therefor
First Claim
1. A topic analyzing apparatus which detects topics while sequentially reading text data in a situation where the text data is added in time series, the apparatus comprising:
- learning means for representing a topic generation model by a mixture distribution model and learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data;
storage means for storing the generation model; and
means for selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model.
1 Assignment
0 Petitions
Accused Products
Abstract
A topic analyzing method is provided in which the number of main topics in text data which is added in time series and generation and disappearance of topics are identified in real time as needed, and features of main topics are extracted and thereby one can know a change in the content of a topic with a minimum amount of memory and processing time. There is provided a system that detects topics while sequentially reading text data in a situation where the text data is added in time series, including learning means for representing a topic generation model by a mixture distribution model and learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data; and model selecting means for selecting an optimal topic generation model from among a plurality of candidate topic generation models on the basis of information criteria of the topic generation models, wherein the topics are detected as mixture components of the optimal generation model.
45 Citations
18 Claims
-
1. A topic analyzing apparatus which detects topics while sequentially reading text data in a situation where the text data is added in time series, the apparatus comprising:
-
learning means for representing a topic generation model by a mixture distribution model and learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data;
storage means for storing the generation model; and
means for selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model.
-
-
2. A topic analyzing apparatus comprising topic generation and disappearance determining means for comparing mixture components of a topic generation model at a particular time with mixture components of a topic generation model at another time to determine whether or not a new topic has been generated and whether or not an existing topic has disappeared.
-
3. A topic analyzing apparatus comprising topic feature representation extracting means for extracting a feature representation of a topic corresponding to each of the mixture components of a topic generation model on the basis of a parameter of the mixture components to characterize each topic.
-
4. A topic analyzing apparatus which detects topics while sequentially reading text data in a situation where the text data is added in time series, the apparatus comprising:
-
learning means for representing a topic generation model by a mixture distribution model and learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data;
storage means for storing the generation model;
means for selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model; and
topic generation and disappearance determining means for comparing mixture components of a topic generation model at a particular time with mixture components of a topic generation model at another time to determine whether or not a new topic has been generated and whether or not an existing topic has disappeared. - View Dependent Claims (5)
-
-
6. A topic analyzing apparatus which detects topics while sequentially reading text data in a situation where the text data is added in time series, the apparatus comprising:
-
learning means for representing a topic generation model by a mixture distribution model and learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data;
storage means for storing the generation model;
means for selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model; and
topic feature extracting means for extracting a feature representation of a topic corresponding to each of the mixture components of a topic generation model on the basis of a parameter of the mixture components to characterize each topic.
-
-
7. A topic analyzing method for detecting topics while sequentially reading text data in a situation where the text data is added in time series, comprising the steps of:
-
representing a topic generation model by a mixture distribution model, learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data and storing the topic generation model in storage means; and
selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model.
-
-
8. A topic analyzing method, comprising the step of comparing mixture components of a topic generation model at a particular time with mixture components of a topic generation model at another time to determine whether or not a new topic has been generated and whether or not an existing topic has disappeared.
-
9. A topic analyzing method, comprising the step of extracting a feature representation of a topic corresponding to each of the mixture components of a topic generation model on the basis of a parameter of the mixture components to characterize each topic.
-
10. A topic analyzing method for detecting topics while sequentially reading text data in a situation where the text data is added in time series, comprising the steps of:
-
representing a topic generation model by a mixture distribution model, learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data, and storing the topic generation model in storage means;
selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model; and
comparing mixture components of a topic generation model at a particular time with mixture components of a topic generation model at another time to determine whether or not a new topic has been generated and whether or not an existing topic has disappeared. - View Dependent Claims (11)
-
-
12. A topic analyzing method for detecting topics while sequentially reading text data in a situation where the text data is added in time series, comprising the steps of:
-
representing a topic generation model by a mixture distribution model, learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data, and storing the topic generation model in storage means;
selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model; and
extracting a feature representation of a topic corresponding to each of the mixture components of a topic generation model on the basis of a parameter of the mixture components to characterize each topic.
-
-
13. A program for causing a computer to perform a method for detecting topics while sequentially reading text data in a situation where the text data is added in time series, comprising the steps of:
-
representing a topic generation model by a mixture distribution model, learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data and storing the topic generation model in storage means; and
selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model.
-
-
14. A computer-readable program comprising the step of comparing mixture components of a topic generation model at a particular time with mixture components of a topic generation model at another time to determine whether or not a new topic has been generated and whether or not an existing topic has disappeared.
-
15. A computer-readable program comprising the step of extracting a feature representation of a topic corresponding to each of the mixture components of a topic generation model on the basis of a parameter of the mixture components to characterize each topic.
-
16. A program for causing a computer to perform a method for detecting topics while sequentially reading text data in a situation where the text data is added in time series, comprising the steps of:
-
representing a topic generation model by a mixture distribution model, learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data, and storing the topic generation model in storage means;
selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model; and
comparing mixture components of a topic generation model at a particular time with mixture components of a topic generation model at another time to determine whether or not a new topic has been generated and whether or not an existing topic has disappeared. - View Dependent Claims (17)
-
-
18. A program for causing a computer to perform a method for detecting topics while sequentially reading text data in a situation where the text data is added in time series, comprising the steps of:
-
representing a topic generation model by a mixture distribution model, learning the topic generation model online while more-heavily discounting the older data on the basis of a timestamp of the data, and storing the topic generation model in storage means;
selecting an optimal topic generation model from among a plurality of candidate topic generation models stored in the storage means, on the basis of information criteria of the topic generation models and detecting topics as mixture components of the optimal topic generation model; and
extracting a feature representation of a topic corresponding to each of the mixture components of a topic generation model on the basis of a parameter of the mixture components to characterize each topic.
-
Specification