Method and system for extracting web data
First Claim
1. A method for analyzing attitudes expressed in web content, said attitudes being in relation to a subject, the method comprising:
- collecting a collection of electronic communications from a web site, the collection of electronic communications originating from a person;
determining a first number of times that a word occurs in first electronic communications identifying the subject, the first electronic communications being included in the collection of electronic communications;
determining a second number of times that the word occurs in second electronic communications not identifying the subject, the second electronic communications being included in the collection of electronic communications;
determining a ratio of the first number and the second number;
comparing the ratio to a threshold;
identifying a phrase including the word as relevant when the ratio exceeds the threshold;
determining a subset of the electronic communications including the phrase;
processing the subset so as to generate attitude information indicative of a plurality of attitudes about the subject; and
outputting said attitude information, to provide an analysis of said attitudes in relation to said subject.
3 Assignments
0 Petitions
Accused Products
Abstract
An apparatus for providing an analysis of attitudes expressed in web content, comprising: a collector for collecting attitude-data in relation to a predetermined subject from one or more pre-selected web site, the attitude-data containing attitudes in relation to the predetermined subject; a processor, associated with the collector, for processing the attitude data so as to generate an attitude analysis; and an outputter, associated with the processor, for outputting the attitude analysis, thereby to provide an indication of attitudes being expressed in the web content in relation to the predetermined subject.
259 Citations
18 Claims
-
1. A method for analyzing attitudes expressed in web content, said attitudes being in relation to a subject, the method comprising:
-
collecting a collection of electronic communications from a web site, the collection of electronic communications originating from a person; determining a first number of times that a word occurs in first electronic communications identifying the subject, the first electronic communications being included in the collection of electronic communications; determining a second number of times that the word occurs in second electronic communications not identifying the subject, the second electronic communications being included in the collection of electronic communications; determining a ratio of the first number and the second number; comparing the ratio to a threshold; identifying a phrase including the word as relevant when the ratio exceeds the threshold; determining a subset of the electronic communications including the phrase; processing the subset so as to generate attitude information indicative of a plurality of attitudes about the subject; and outputting said attitude information, to provide an analysis of said attitudes in relation to said subject. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A tangible computer readable medium storing instructions to analyze attitudes expressed in web content, said attitudes being in relation to a subject, wherein the instructions, when executed, cause a machine to:
-
collect a collection of electronic communications from a web site, the collection of electronic communications originating from a person; determine a first number of times that a word occurs in first electronic communications identifying the subject, the first electronic communications being included in the collection of electronic communications; determine a second number of times that the word occurs in second electronic communications not identifying the subject, the second electronic communications being included in the collection of electronic communications; determine a ratio of the first number and the second number; compare the ratio to a threshold; identify a phrase including the word as relevant when the ratio exceeds the threshold; determine a subset of the electronic communications including the phrase; process the subset so as to generate attitude information indicative of a plurality of attitudes about the subject; and output said attitude information, to provide an analysis of said attitudes in relation to said subject. - View Dependent Claims (17, 18)
-
Specification