Hot topic extraction apparatus and method, storage medium therefor
First Claim
1. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
- collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found;
storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents;
assigning weights to stored documents according to the date and time information;
collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found;
retrieving from the storage unit, a stored document having corresponding location information that matches the location information of the newly collected document;
extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and
extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic.
1 Assignment
0 Petitions
Accused Products
Abstract
A hot topic extraction apparatus for extracting a hot topic from information includes an information collection unit, an information storage unit, and a hot topic extraction unit. The information collection unit collects a document from an information source. The information storage unit stores collected information. The hot topic extraction unit extracts a document from the information storage unit while the information collection unit collects the document, and retrieves a document from the retrieved document, thereby extracting a hot topic from information contained in various information sources, not extracting a hot topic from a given document.
80 Citations
12 Claims
-
1. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; retrieving from the storage unit, a stored document having corresponding location information that matches the location information of the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic.
-
-
2. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; setting an analysis target word; retrieving a stored document containing the analysis target word from the storage unit and having location information that matches the location information of the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic.
-
-
3. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
defining a method of storing each document in a storage unit corresponding to an information type based on an issuer of the document, each stored document having corresponding location information indicating a location from which the respective document was found before being stored in the storage unit, and each stored document stored with date and time information indicating a collection date and time added to the collected document; assigning weights to the stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; determining an information type of the newly collected document based on a collection source from which the newly collected document was collected; storing the newly collected document in the storage unit based on the said defined method in accordance with the determined information type of the newly collected document; retrieving a stored document from the storage unit having corresponding location information that matches the location information of the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency are equal to or higher than an average and an expected value, as the hot topic.
-
-
4. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; setting an analysis target word; retrieving stored documents from the storage unit containing the analysis target word and having corresponding location information that matches the location information of the newly collected document; extracting, from the retrieved documents, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic.
-
-
5. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; retrieving a stored document from the storage unit and having corresponding location information that matches the location information of the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic; accumulating a result of repeatedly extracting the hot topic from the retrieved document; and extracting a change with time of the result of repeatedly extracting the hot topic.
-
-
6. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; setting an analysis target word; setting a specific word belonging to a specific category; retrieving a stored document containing the analysis target word and having corresponding location information that matches the location information of the newly collected document, from the storage unit; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic.
-
-
7. A computer-readable storage medium storing a computer program used to direct a computer to control a process for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the process comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; setting an analysis target word; retrieving from the storage unit a stored document containing the analysis target word and having location information that matches the location information of the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic.
-
-
8. A hot topic extracting method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document based on information collection strategy rules describing a method of collecting the new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; retrieving a document from the storage unit having corresponding location information that matches the location information of the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; extracting said words or phrase, said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic; and providing a notification indicating whether the extracted hot topic satisfies a condition of a desired topic.
-
-
9. A computer-implemented hot topic extraction apparatus for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the apparatus comprising:
-
a collector collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; a storage unit storing the collected documents with date and time information indicating a collection date and time added to the collected documents; an assigning device assigning weights to stored documents according to the date and time information; an information collection unit collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the collected document was found; and a hot topic extraction unit retrieving from the storage unit a stored document having corresponding location information that matches the location information of the newly collected document, in parallel to a process of said information collection unit collecting the new document, and extracting a hot topic, wherein said extracting the hot topic includes; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency are equal to or higher than an average and an expected value, as the hot topic.
-
-
10. A computer-readable storage medium storing a program used to direct a computer to perform a process for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the process comprising:
-
collecting a plurality of documents, each collected document having corresponding location information indicating a location from which the respective, collected document was found; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source, the newly collected document having corresponding location information indicating a location from which the newly collected document was found; retrieving from the storage unit a stored document having corresponding location information that matches the location information of the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency are equal to or higher than an average and an expected value, as the hot topic.
-
-
11. A method for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the method comprising:
-
collecting a plurality of documents; storing the collected documents in a storage unit with date and time information indicating a collection date and time added to the collected documents; assigning weights to stored documents according to the date and time information; collecting a new document from an information source; retrieving from the storage unit a stored document that satisfies a predetermined condition with respect to the newly collected document; extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency equal to or higher than an average and an expected value, as the hot topic.
-
-
12. An apparatus for extracting a trend, a fashion, a noticed word, or a concept as a hot topic from a plurality of documents which exist in a wide-ranging information source on a network, the apparatus comprising:
-
means for collecting a plurality of documents; means for storing the collected documents with date and time information indicating a collection date and time added to the collected documents, means for assigning weights to the stored documents according to the date and time information; means for collecting a new document from an information source; means for retrieving a stored document that satisfies a predetermined condition with respect to the newly collected document; means for extracting, from the retrieved document, words or a phrase and counting a frequency of the occurrences of the extracted words or phrase, and obtaining a broader concept word of an extracted word and counting a frequency of the occurrences of the broader concept word, based on the assigned weights; and means for extracting said words or phrase, and said broader concept word, having occurrence frequency and document frequency are equal to or higher than an average and an expected value, as the hot topic.
-
Specification