System and method for detecting personal experience event reports from user generated internet content
First Claim
1. A method, implementable on a computing device, for detecting personal experience event reports from user generated content on the Internet, the method comprising:
- filtering a collection of Internet posts to include only said Internet posts containing personal experience terms;
further filtering said filtered Internet posts by removing said Internet posts with non-personal experience terms;
analyzing said Internet posts to define segments in said Internet posts, wherein said segments at least contain terms consistent with user generation of a personal experience event report associated with a pre-defined search subject;
scoring each of said segments, wherein said score indicates a likelihood that said Internet post associated with said segment represents a user generated said personal experience report associated with said pre-defined search subject; and
storing at least indications of said Internet posts with associated said scores above a pre-defined threshold in a searchable personal experience database;
and wherein said analyzing also comprises;
filtering said Internet posts to remove said Internet posts that do not at least contain said terms from each of a minimum number of term categories associated with said pre-defined subject;
detecting a pair of anchors from two anchor categories, wherein said anchor categories are also term categories and represent two essential components of said user generated personal experience reports;
defining a basic said segment as a shortest section of text between said pair of anchors;
when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories;
calculating a density value for said terms in said basic segment;
expanding said basic segment to include a nearest said term not included in said basic segment;
recalculating said density value for said expanded basic segment;
iteratively repeating said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and
defining said expanded basic segment as a final segment.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for retrieving Internet posts, implementable on a computing device, includes analyzing Internet posts to define segments in the Internet posts, where the segments at least contain terms consistent with user generation of a personal experience event report associated with a pre-defined search subject, and scoring each of the segments, where the score indicates a likelihood that the Internet post associated with the segment represents a user generated the personal experience report associated with the pre-defined search subject. A method for detecting personal experience event reports from user generated content on the Internet includes filtering a collection of Internet posts to include only Internet posts containing personal experience terms, and further filtering the filtered Internet posts by removing the Internet posts with non-personal experience terms.
-
Citations
8 Claims
-
1. A method, implementable on a computing device, for detecting personal experience event reports from user generated content on the Internet, the method comprising:
-
filtering a collection of Internet posts to include only said Internet posts containing personal experience terms; further filtering said filtered Internet posts by removing said Internet posts with non-personal experience terms; analyzing said Internet posts to define segments in said Internet posts, wherein said segments at least contain terms consistent with user generation of a personal experience event report associated with a pre-defined search subject; scoring each of said segments, wherein said score indicates a likelihood that said Internet post associated with said segment represents a user generated said personal experience report associated with said pre-defined search subject; and storing at least indications of said Internet posts with associated said scores above a pre-defined threshold in a searchable personal experience database; and wherein said analyzing also comprises; filtering said Internet posts to remove said Internet posts that do not at least contain said terms from each of a minimum number of term categories associated with said pre-defined subject; detecting a pair of anchors from two anchor categories, wherein said anchor categories are also term categories and represent two essential components of said user generated personal experience reports; defining a basic said segment as a shortest section of text between said pair of anchors; when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories; calculating a density value for said terms in said basic segment; expanding said basic segment to include a nearest said term not included in said basic segment; recalculating said density value for said expanded basic segment; iteratively repeating said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and defining said expanded basic segment as a final segment.
-
-
2. A method, implementable on a computing device, for detecting personal experience event reports from user generated content on the Internet, the method comprising:
-
filtering a collection of Internet posts to include only said Internet posts containing personal experience terms; and further filtering said filtered Internet posts by removing said Internet posts with non-personal experience terms; analyzing said Internet posts to define segments in said Internet posts, wherein said segments at least contain terms consistent with user generation of a personal experience event report associated with a pre-defined search subject; scoring each of said segments, wherein said score indicates a likelihood that said Internet post associated with said segment represents a user generated said personal experience report associated with said pre-defined search subject; and storing at least indications of said Internet posts with associated said scores above a pre-defined threshold in a searchable personal experience database; and wherein said scoring comprises; defining a set of indicating factors, wherein each said indicating factor is associated with a possible feature in said segments, wherein said possible features affects said likelihood that said Internet post associated with said segment represents a user generated said personal experience event report associated with said pre-defined search subject; and weighting said indicating factors in accordance with said likelihood, wherein each of said indicating factors is at least one of negative and positive; and wherein said defining and weighting is at least in accordance with linear regression of a training set of said Internet posts known to represent said user generated said personal experience event reports; and wherein said analyzing further comprises; filtering said Internet posts to remove said Internet posts that do not at least contain terms from each of a minimum number of term categories that are associated with a user generated personal experience report associated with a pre-defined subject; detecting a pair of anchors from two anchor categories, wherein said anchor categories are among said term categories and represent two essential components of said user generated personal experience event reports; defining a basic said segment as a shortest section of text between said pair of anchors; when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories; calculating a density value for said terms in said basic segment; expanding said basic segment to include a nearest said term not included in said basic segment; recalculating said density value for said expanded basic segment; iteratively repeating said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and defining said expanded basic segment as a final segment.
-
-
3. A method, implementable on a computing device, for detecting personal experience event reports from user generated content on the Internet, the method comprising:
-
filtering a collection of Internet posts to include only said Internet posts containing personal experience terms; further filtering said filtered Internet posts by removing said Internet posts with non-personal experience terms; analyzing said Internet posts to define segments in said Internet posts, wherein said segments at least contain terms consistent with user generation of a personal experience event report associated with a pre-defined search subject; scoring each of said segments, wherein said score indicates a likelihood that said Internet post associated with said segment represents a user generated said personal experience report associated with said pre-defined search subject; and storing at least indications of said Internet posts with associated said scores above a pre-defined threshold in a searchable personal experience database; and wherein said scoring comprises; defining a set of indicating factors, wherein each said indicating factor is associated with a possible feature in said segments, wherein said possible features affects said likelihood that said Internet post associated with said segment represents a user generated said personal experience event report associated with said pre-defined search subject; and weighting said indicating factors in accordance with said likelihood, wherein each of said indicating factors is at least one of negative and positive; and wherein said defining and weighting is at least in accordance with linear regression of a training set of said Internet posts known not to represent said user generated said personal experience event reports; and wherein said analyzing further comprises; filtering said Internet posts to remove said Internet posts that do not at least contain terms from each of a minimum number of term categories that are associated with a user generated personal experience report associated with a pre-defined subject; detecting a pair of anchors from two anchor categories, wherein said anchor categories are among said term categories and represent two essential components of said user generated personal experience event reports; defining a basic said segment as a shortest section of text between said pair of anchors; when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories; calculating a density value for said terms in said basic segment; expanding said basic segment to include a nearest said term not included in said basic segment; recalculating said density value for said expanded basic segment; iteratively repeating said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and defining said expanded basic segment as a final segment.
-
-
4. An Internet post retrieval system, implementable on a computing device, comprising:
-
a segment analyzer module to define segments in said Internet posts, wherein said segments at least contain terms consistent with user generation of a personal experience event report associated with a pre-defined search subject; a scoring engine to calculate a score for each said segment, wherein said score indicates a likelihood that said Internet post associated with said segment represents a user generated said personal experience report associated with said pre-defined search subject; and a website selection utility to generate a collection list of websites, wherein said websites are determined by said website selection utility to contain user generated personal experience event reports; and wherein said utility comprises; a pattern recognizer to at least detect textual patterns associated with said user generated personal experience event reports in a training set of “
good”
said Internet posts;a training set scoring engine to calculate weighted indicators based on at least said detected textual patterns; and a candidate scoring engine to at least apply said weighted indicators to Internet posts from candidate websites to determine, for each of said candidate websites, if they contain said user generated personal experience event reports; and wherein said segment analyzer; filters said Internet posts to remove said Internet posts that do not at least contain terms from each of a minimum number of term categories that are associated with a user generated personal experience report associated with a pre-defined subject; detects a pair of anchors from two anchor categories, wherein said anchor categories are among said term categories and represent two essential components of said user generated personal experience event reports; defines a basic said segment as a shortest section of text between said pair of anchors; when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories; said analyzer further; calculates a density value for said terms in said basic segment; expands said basic segment to include a nearest said term not included in said basic segment; recalculates said density value for said expanded basic segment; iteratively repeats said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and defines said expanded basic segment as a final segment.
-
-
5. A method for isolating segments from Internet posts, implementable on a computing device, comprising:
-
filtering said Internet posts to remove said Internet posts that do not at least contain terms from each of a minimum number of term categories that are associated with a user generated personal experience report associated with a pre-defined subject; detecting a pair of anchors from two anchor categories, wherein said anchor categories are among said term categories and represent two essential components of said user generated personal experience event reports; defining a basic said segment as a shortest section of text between said pair of anchors; when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories; calculating a density value for said terms in said basic segment; expanding said basic segment to include a nearest said term not included in said basic segment; recalculating said density value for said expanded basic segment; iteratively repeating said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and defining said expanded basic segment as a final segment. - View Dependent Claims (6)
-
-
7. A method for scoring segments of Internet posts, implementable on a computing device, comprises:
-
defining a set of indicating factors, wherein each said indicating factor is associated with a possible feature in said segments, wherein said possible features affect a likelihood that said Internet post associated with said segment represents a user generated said personal experience event report associated with a pre-defined search subject; weighting said indicating factors in accordance with said likelihood, wherein each of said indicating factors is at least one of negative and positive; and wherein said defining and weighting is at least in accordance with linear regression of a training set of said Internet posts known to represent said user generated said personal experience event reports; and wherein the method further comprises; filtering said Internet posts to remove said Internet posts that do not at least contain terms from each of a minimum number of term categories that are associated with a user generated personal experience report associated with a pre-defined subject; detecting a pair of anchors from two anchor categories, wherein said anchor categories are among said term categories and represent two essential components of said user generated personal experience event reports; defining a basic said segment as a shortest section of text between said pair of anchors; when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories; calculating a density value for said terms in said basic segment; expanding said basic segment to include a nearest said term not included in said basic segment; recalculating said density value for said expanded basic segment; iteratively repeating said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and defining said expanded basic segment as a final segment.
-
-
8. A method for scoring segments of Internet posts, implementable on a computing device, comprises:
-
defining a set of indicating factors, wherein each said indicating factor is associated with a possible feature in said segments, wherein said possible features affect a likelihood that said Internet post associated with said segment represents a user generated said personal experience event report associated with a pre-defined search subject; weighting said indicating factors in accordance with said likelihood, wherein each of said indicating factors is at least one of negative and positive; and wherein said defining and weighting is at least in accordance with linear regression of a training set of said Internet posts known not to represent said user generated said personal experience event reports; and wherein the method further comprises; filtering said Internet posts to remove said Internet posts that do not at least contain terms from each of a minimum number of term categories that are associated with a user generated personal experience report associated with a pre-defined subject; detecting a pair of anchors from two anchor categories, wherein said anchor categories are among said term categories and represent two essential components of said user generated personal experience event reports; defining a basic said segment as a shortest section of text between said pair of anchors; when said shortest section of text does not include at least one said term from each of said minimum number of term categories, expanding said basic segment to extend beyond said shortest section of text to include at least one said term from each of said minimum number of term categories; calculating a density value for said terms in said basic segment; expanding said basic segment to include a nearest said term not included in said basic segment; recalculating said density value for said expanded basic segment; iteratively repeating said expanding and recalculating until said recalculated density value is less than a previously calculated said density value; and defining said expanded basic segment as a final segment.
-
Specification