System and method for online information analysis
First Claim
1. A computer-implemented method comprising:
- accumulating multiple historical collections of blog entries, the blog entries each having been retrieved from a network and each satisfying predefined source and subject matter criteria;
scraping, then tokenizing, then stoplist filtering, then vectorizing, then weighting, then normalizing the historical collections of the blog entries;
receiving a first user input identifying a subject of which a user intends to analyze prevalence over time in the historical collections of blog entries, the subject being associated with a collection of query terms;
determining the query terms associated with the identified subject;
selecting a single historical collection of blog entries corresponding to the identified subject, from among the multiple, normalized historical collections of blog entries;
determining the prevalence of the query terms over time within the single selected historical collection;
generating a line chart illustrating a change in the determined prevalence of the query terms over time, a line of the line chart including points which define hyperlinks to one or more blog entries of the selected single historical collection;
receiving a second user input selecting one of the hyperlinks; and
presenting the blog entry associated with the selected hyperlink.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure includes systems and techniques relating to online information analysis. In general, in one implementation, a system includes a collection engine configured to accumulate document information retrieved from publicly accessible network resources according to predefined subjects, and an analysis engine configured to analyze the accumulated document information to identify change over a time period in general discussion of a topic within a selected subject of the predefined subjects, the analysis engine further configured to normalize the identified change over the time period based on change in a total number of documents found for the selected subject during the time period.
-
Citations
21 Claims
-
1. A computer-implemented method comprising:
-
accumulating multiple historical collections of blog entries, the blog entries each having been retrieved from a network and each satisfying predefined source and subject matter criteria; scraping, then tokenizing, then stoplist filtering, then vectorizing, then weighting, then normalizing the historical collections of the blog entries; receiving a first user input identifying a subject of which a user intends to analyze prevalence over time in the historical collections of blog entries, the subject being associated with a collection of query terms; determining the query terms associated with the identified subject; selecting a single historical collection of blog entries corresponding to the identified subject, from among the multiple, normalized historical collections of blog entries; determining the prevalence of the query terms over time within the single selected historical collection; generating a line chart illustrating a change in the determined prevalence of the query terms over time, a line of the line chart including points which define hyperlinks to one or more blog entries of the selected single historical collection; receiving a second user input selecting one of the hyperlinks; and presenting the blog entry associated with the selected hyperlink. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable medium encoded with a computer program comprising instructions that, when executed, operate to cause a computer to perform operations comprising:
-
accumulating multiple historical collections of blog entries, the blog entries each having been retrieved from a network and each satisfying predefined source and subject matter criteria; scraping, then tokenizing, then stoplist filtering, then vectorizing, then weighting, then normalizing the historical collections of the blog entries; receiving a first user input identifying a subject of which a user intends to analyze prevalence over time in the historical collections of blog entries, the subject being associated with a collection of query terms; determining the query terms associated with the identified subject; selecting a single historical collection of blog entries corresponding to the identified subject, from among the multiple, normalized historical collections of blog entries; determining the prevalence of the query terms over time within the single selected historical collection; generating a line chaff illustrating a change in the determined prevalence of the query terms over time, a line of the line chart including points which define hyperlinks to one or more blog entries of the selected single historical collection; receiving a second user input selecting one of the hyperlinks; and presenting the blog entry associated with the selected hyperlink. - View Dependent Claims (10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21)
-
-
14. A system comprising:
-
one or more computers; and a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, causes the one or more computers to perform operations comprising; accumulating multiple historical collections of blog entries, the blog entries each having been retrieved from a network and each satisfying predefined source and subject matter criteria; scraping, then tokenizing, then stoplist filtering, then vectorizing, then weighting, then normalizing the historical collections of the blog entries; receiving a first user input identifying a subject of which a user intends to analyze prevalence over time in the historical collections of blog entries, the subject being associated with a collection of query terms; determining the query terms associated with the identified subject; selecting a single historical collection of blog entries corresponding to the identified subject, from among the multiple, normalized historical collections of blog entries; determining the prevalence of the query terms over time within the single selected historical collection; generating a line chart illustrating a change in the determined prevalence of the query terms over time, a line of the line chart including points which define hyperlinks to one or more blog entries of the selected single historical collection; receiving a second user input selecting one of the hyperlinks; and presenting the blog entry associated with the selected hyperlink.
-
Specification