Methods and systems for analyzing reading logs and documents thereof
First Claim
1. A method for analyzing reading logs and documents corresponding thereto, comprising:
- acquiring reading logs related to webpages and documents corresponding thereto, wherein the reading logs at least includes reading-related information about the documents within a predetermined period of time and the reading-related information at least includes an interesting reading time and a number of interesting readings;
selecting a plurality of interesting document sets from the documents in each time segment of the predetermined period of time according to the interesting reading times and the number of interesting readings of the documents in the reading logs, each of the interesting document sets corresponding to one of the time segments of the predetermined period of time;
performing a document content pre-processing on the interesting document sets to determine keyword sets corresponding to the interesting document sets;
performing a cluster calculation on the keyword sets to obtain topics and calculating cohesion of each topic;
deleting topics with insufficient cohesion among the topics obtained to obtain a plurality of high-relevance topics and classifying each high-relevance topic into one of a plurality of predetermined topic classes by comparing the respective keyword sets of the high-relevance topics with a plurality of keyword sets of the predetermined topic classes;
obtaining reading statistics for documents of each predetermined topic class and calculating a plurality of degrees of interest for documents of each predetermined topic class during each time segment; and
determining a reading trend on each predetermined topic class according to changes in the degrees of interest,wherein the document content pre-processing step further comprises the steps of performing the following steps on each document of the interesting document sets;
obtaining a plurality of keywords;
paragraphing the document and calculating a frequency at which the keywords appear in each paragraph to calculate a plurality of importance-weightings corresponding to all of the paragraphs and determining at least one key paragraph according to the importance-weightings; and
generating the set of keywords for the document based on the keywords within the at least one key paragraph.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods for analyzing reading log and documents corresponding thereof are provided, including: acquiring reading log and documents corresponding thereto, wherein the reading log at least includes reading-related information about the documents within a predetermined period of time, selecting interesting document sets from the documents according to the reading log in each time segment, performing a document content pre-processing on the interesting document sets to determine keyword sets corresponding thereto for each time segment according to the interesting document sets, performing cluster calculation on the keyword sets to obtain topics and calculating cohesion of each topic, deleting topics with insufficient cohesion to obtain multiple high-relevance topics and classifying each high-relevance topic into one of predetermined topic classes according to the respective keyword sets of the high-relevance topics, obtaining reading statistics for each topic class and calculating multiple degrees of interest for each topic class during each time segment.
-
Citations
18 Claims
-
1. A method for analyzing reading logs and documents corresponding thereto, comprising:
-
acquiring reading logs related to webpages and documents corresponding thereto, wherein the reading logs at least includes reading-related information about the documents within a predetermined period of time and the reading-related information at least includes an interesting reading time and a number of interesting readings; selecting a plurality of interesting document sets from the documents in each time segment of the predetermined period of time according to the interesting reading times and the number of interesting readings of the documents in the reading logs, each of the interesting document sets corresponding to one of the time segments of the predetermined period of time; performing a document content pre-processing on the interesting document sets to determine keyword sets corresponding to the interesting document sets; performing a cluster calculation on the keyword sets to obtain topics and calculating cohesion of each topic; deleting topics with insufficient cohesion among the topics obtained to obtain a plurality of high-relevance topics and classifying each high-relevance topic into one of a plurality of predetermined topic classes by comparing the respective keyword sets of the high-relevance topics with a plurality of keyword sets of the predetermined topic classes; obtaining reading statistics for documents of each predetermined topic class and calculating a plurality of degrees of interest for documents of each predetermined topic class during each time segment; and determining a reading trend on each predetermined topic class according to changes in the degrees of interest, wherein the document content pre-processing step further comprises the steps of performing the following steps on each document of the interesting document sets; obtaining a plurality of keywords; paragraphing the document and calculating a frequency at which the keywords appear in each paragraph to calculate a plurality of importance-weightings corresponding to all of the paragraphs and determining at least one key paragraph according to the importance-weightings; and generating the set of keywords for the document based on the keywords within the at least one key paragraph. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 17)
-
-
9. A system, implemented by a processor, for analyzing reading logs and documents corresponding thereto, comprising:
-
a reading log extractor, acquiring reading logs related to webpages and documents corresponding thereto, wherein the reading logs at least includes reading-related information about the documents within a predetermined period of time and the reading-related information at least includes an interesting reading time and a number of interesting readings; an interesting document filter coupled to the reading log extractor, selecting a plurality of interesting document sets from the documents in each time segment of the predetermined period of time according to the interesting reading times and the number of interesting readings of the documents in the reading logs, each of the interesting document sets corresponding to one of the time segments of the predetermined period of time; a document pre-processor coupled to the interesting document filter, performing a document content pre-processing on the interesting document sets to determine keyword sets corresponding to the interesting document sets; a topic cluster generator coupled to the document pre-processor, performing a cluster calculation on the keyword sets to obtain topics, calculating cohesion of each topic and deleting topics with insufficient cohesion among the topics obtained to obtain a plurality of high-relevance topics; a topic classifier and combiner coupled to the topic cluster generator, classifying each high-relevance topic into one of a plurality of predetermined topic classes by comparing the respective keyword sets of the high-relevance topics with a plurality of keyword sets of the predetermined topic classes; a degree of interest normalizer coupled to the topic classifier and combiner, obtaining reading statistics for documents of each predetermined topic class and calculating a plurality of degrees of interest for documents of each predetermined topic class during each time segment; and a reading trend analyzer coupled to the degree of interest normalizer, determining a reading trend on each predetermined topic class according to changes in the degrees of interest, wherein for each document of the interesting document sets, the document pre-processor further obtains a plurality of keywords, paragraphs the document and calculates a frequency at which the keywords appear in each paragraph to calculate a plurality of importance-weightings corresponding to all of the paragraphs and determines at least one key paragraph according to the importance-weightings, and generates the set of keywords for the document based on the keywords within the at least one key paragraph. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 18)
-
Specification