Systems and methods for analyzing electronic text
First Claim
1. A computer-implemented method for systematically analyzing an electronic text, comprising:
- receiving by a computer the electronic text from a plurality of sources;
determining an at least one term of interest to be identified in the electronic text;
determining an at least one term of interest to be identified in the electronic text;
identifying by the computer a plurality of locations within the electronic text including the at least one term of interest;
for each location within a plurality of locations, creating by the computer a snippet from a text segment around the at least one term of interest at the location within the electronic text;
creating by the computer multiple taxonomies for the at least one term of interest from the snippets, wherein the taxonomies include an at least one category, the at least one category including a sentiment based taxonomy; and
determining by the computer associations between categories of a different taxonomies of the multiple taxonomies by determining;
co-occurrences between the multiple taxonomies; and
significance of co-occurrences between the multiple taxonomies,wherein the determining the co-occurrences further comprises;
determining co-occurrences between a category of a single taxonomy and the at least one term of interest to determine significance of the at least one term of interest; and
sorting the at least one term of interest by significance; and
wherein at least one of the taxonomies is a time based taxonomy that is based on the creation date of the electronic text, the time based taxonomy generated by;
crawling sources of electronic text to extract the creation dates;
attaching an extracted creation date to a respective snippet to generate a dated snippet; and
organizing the dated snippets into chronologically contiguous categories,wherein the sentiment based taxonomy is determined by;
creating a list of positive, negative and neutral terms indicative of different sentiments, respectively;
determining the level of sentiment corresponding to the at least one term generated from a respective snippet based on an assigned value;
normalizing the values to generate at least one term having a sentiment score corresponding thereto, the sentiment score including at least one of a positive sentiment score and a negative sentiment score; and
sorting snippets of the electronic text based on a calculated sentiment score differential between the at least one positive sentiment score and the at least one negative sentiment score.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for systematically analyzing an electronic text are described. In one embodiment, the method includes receiving the electronic text from a plurality of sources. The method also includes determining an at least one term of interest to be identified in the electronic text. The method further includes identifying a plurality of locations within the electronic text including the at least one term of interest. The method also includes for each location within a plurality of locations, creating a snippet from a text segment around the at least one term of interest at the location within the electronic text. The method further includes creating multiple taxonomies for the at least one term of interest from the snippets, wherein the taxonomies include an at least one category. The method also includes determining co-occurrences between the multiple taxonomies to determine associations between categories of a different taxonomies of the multiple taxonomies.
56 Citations
16 Claims
-
1. A computer-implemented method for systematically analyzing an electronic text, comprising:
-
receiving by a computer the electronic text from a plurality of sources; determining an at least one term of interest to be identified in the electronic text; determining an at least one term of interest to be identified in the electronic text; identifying by the computer a plurality of locations within the electronic text including the at least one term of interest; for each location within a plurality of locations, creating by the computer a snippet from a text segment around the at least one term of interest at the location within the electronic text; creating by the computer multiple taxonomies for the at least one term of interest from the snippets, wherein the taxonomies include an at least one category, the at least one category including a sentiment based taxonomy; and determining by the computer associations between categories of a different taxonomies of the multiple taxonomies by determining; co-occurrences between the multiple taxonomies; and significance of co-occurrences between the multiple taxonomies, wherein the determining the co-occurrences further comprises; determining co-occurrences between a category of a single taxonomy and the at least one term of interest to determine significance of the at least one term of interest; and
sorting the at least one term of interest by significance; andwherein at least one of the taxonomies is a time based taxonomy that is based on the creation date of the electronic text, the time based taxonomy generated by; crawling sources of electronic text to extract the creation dates; attaching an extracted creation date to a respective snippet to generate a dated snippet; and organizing the dated snippets into chronologically contiguous categories, wherein the sentiment based taxonomy is determined by; creating a list of positive, negative and neutral terms indicative of different sentiments, respectively; determining the level of sentiment corresponding to the at least one term generated from a respective snippet based on an assigned value; normalizing the values to generate at least one term having a sentiment score corresponding thereto, the sentiment score including at least one of a positive sentiment score and a negative sentiment score; and sorting snippets of the electronic text based on a calculated sentiment score differential between the at least one positive sentiment score and the at least one negative sentiment score. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for systematically analyzing an electronic text, comprising:
-
a receiver to receive the electronic text from a plurality of sources; a processor coupled to the receiver to; determine an at least one term of interest to be identified in the electronic text; identify a plurality of locations within the electronic text including the at least one term of interest; create for each location within a plurality of locations a snippet from a text segment around the at least one term of interest at the location within the electronic text; create multiple taxonomies for the at least one term of interest from the snippets, wherein the taxonomies include an at least one category, the at least one category including a sentiment based taxonomy; and determine between categories of a different taxonomies of the multiple taxonomies by determining; co-occurrence between the multiple taxonomies; wherein at least one of the taxonomies is a time based taxonomy that is based on the creation date of the electronic text, the time based taxonomy generated by; crawling sources of electronic text to extract the creation dates; attaching an extracted creation date to a respective snippet to generate a dated snippet; and organizing the dated snippets into chronologically contiguous categories; and a module in electrical communication with the processor, the module configured to determine co-occurrences for a single taxonomy against a term feature space to determine significance of the at least one term of interest; and
a module to sort the at least one term of interest by significance,wherein the sentiment based taxonomy is determined by; creating a list of positive, negative and neutral terms indicative of different sentiments, respectively; determining the level of sentiment corresponding to the at least one term generated from a respective snippet based on an assigned value; normalizing the values to generate at least one term having a sentiment score corresponding thereto, the sentiment score including at least one of a positive sentiment score and a negative sentiment score; and sorting snippets of the electronic text based on a calculated sentiment score differential between the at least one positive sentiment score and the at least one negative sentiment score. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product comprising a non-transitory computer useable storage medium to store a computer readable program, wherein the non-transitory computer readable program, when executed on a computer, causes the computer to perform operations comprising:
-
receiving the electronic text from a plurality of sources;
determining an at least one term of interest to be identified in the electronic text;determining an at least one term of interest to be identified in the electronic text; identifying a plurality of locations within the electronic text including the at least one term of interest; for each location within a plurality of locations, creating by a snippet from a text segment around the at least one term of interest at the location within the electronic text; creating multiple taxonomies for the at least one term of interest from the snippets, wherein the taxonomies include an at least one category, the at least one category including a sentiment based taxonomy; and determining associations between categories of a different taxonomies of the multiple taxonomies by determining; co-occurrences between the multiple taxonomies; and significance of co-occurrences between the multiple taxonomies; determining co-occurrences between a category of a single taxonomy and the at least one term of interest to determine significance of the at least one term of interest; sorting the at least one term of interest by a respective significance; and outputting the sorted at least one term of interest, wherein at least one of the taxonomies is a time based taxonomy that is based on the creation date of the electronic text, the time based taxonomy generated by; crawling sources of electronic text to extract the creation dates; attaching an extracted creation date to a respective snippet to generate a dated snippet; and organizing the dated snippets into chronologically contiguous categories, wherein the sentiment based taxonomy is determined by; creating a list of positive, negative and neutral terms indicative of different sentiments, respectively; determining the level of sentiment corresponding to the at least one term generated from a respective snippet based on an assigned value; normalizing the values to generate at least one term having a sentiment score corresponding thereto, the sentiment score including at least one of a positive sentiment score and a negative sentiment score; and sorting snippets of the electronic text based on a calculated sentiment score differential between the at least one positive sentiment score and the at least one negative sentiment score. - View Dependent Claims (14, 15, 16)
-
Specification