METHOD OF MONITORING ELECTRONIC MEDIA
First Claim
1. A computer-implemented method, comprising:
- applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents;
extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest;
applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and
identifying documents corresponding to the identified text snippets.
1 Assignment
0 Petitions
Accused Products
Abstract
Consumer-generated media (CGM) and/or other media are monitored to allow an organization to become aware of, and respond to, issues that may affect how it is perceived by the public. An extract, transform, load (ETL) engine is used to process CGM and other media content, and an analytical engine utilizes a multi-step progressive filtering approach to identify those documents that are most relevant. The filtering approach includes executing broad queries to extract relevant content from different CGM and other sources, extracting text snippets from the relevant content and performing de-duplication, defining organizational identity (e.g., brand name, trade name, or company name) and hot-topic models using a rule-based and statistical-based approach, and using the models together in an orthogonal filtering approach to effectively generate alerts and reports. The methodology is found to be substantially more effective compared to a conventional keyword based approach.
64 Citations
19 Claims
-
1. A computer-implemented method, comprising:
-
applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents; extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest; applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and identifying documents corresponding to the identified text snippets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method, comprising:
-
applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents; extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest; applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and identifying documents corresponding to the identified text snippets, wherein; i) the name-based text pattern algorithm excludes from consideration text snippets in which the given name appears but which in context does not refer to the entity of interest, ii) the topic-based text pattern algorithm specifies at least one unwanted text pattern that matches a topic other than the specific topic of interest, so that text snippets including said at least one unwanted text pattern are excluded from consideration, iii) each of the algorithms uses at least one regular expression, and iv) the given name is selected from the group consisting of trade names, brand names, and company names. - View Dependent Claims (16, 17, 18)
-
-
19. At least one tangible computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
-
applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents; extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest; applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and identifying documents corresponding to the identified text snippets.
-
Specification