METHOD OF MONITORING ELECTRONIC MEDIA

US 20090119275A1
Filed: 10/29/2007
Published: 05/07/2009
Est. Priority Date: 10/29/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents;

extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest;

applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and

identifying documents corresponding to the identified text snippets.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Consumer-generated media (CGM) and/or other media are monitored to allow an organization to become aware of, and respond to, issues that may affect how it is perceived by the public. An extract, transform, load (ETL) engine is used to process CGM and other media content, and an analytical engine utilizes a multi-step progressive filtering approach to identify those documents that are most relevant. The filtering approach includes executing broad queries to extract relevant content from different CGM and other sources, extracting text snippets from the relevant content and performing de-duplication, defining organizational identity (e.g., brand name, trade name, or company name) and hot-topic models using a rule-based and statistical-based approach, and using the models together in an orthogonal filtering approach to effectively generate alerts and reports. The methodology is found to be substantially more effective compared to a conventional keyword based approach.

64 Citations

View as Search Results

19 Claims

1. A computer-implemented method, comprising:
- applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents;
  
  extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest;
  
  applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and
  
  identifying documents corresponding to the identified text snippets.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the name-based text pattern algorithm excludes from consideration text snippets in which the given name appears but which in context does not refer to the entity of interest.
  - 3. The method of claim 1, wherein the topic-based text pattern algorithm specifies at least one unwanted text pattern that matches a topic other than the specific topic of interest, so that text snippets including said at least one unwanted text pattern are excluded from consideration.
  - 4. The method of claim 1, further comprising:
    - reading at least portions of said identified documents; and
      
      identifying potential newsworthy trends in view of said reading.
  - 5. The method of claim 1, wherein the given name is a trade name.
  - 6. The method of claim 1, wherein the given name is a name of a company.
  - 7. The method of claim 1, wherein the given name is a brand name.
  - 8. The method of claim 1, wherein each of the algorithms uses at least one regular expression.
  - 9. The method of claim 1, comprising:
    - applying the name-based text pattern algorithm to extracted text snippets to determine which ones of them have a reference to the particular entity of interest, thereby forming a subset of the extracted text snippets; and
      
      applying the topic-based text pattern algorithm to text snippets in the subset of the extracted text snippets, thereby identifying text snippets having references to both i) the particular entity of interest and ii) the specific topic of interest.
  - 10. The method of claim 1, wherein each of the text snippets is no longer than three sentences.
  - 11. The method of claim 1, wherein the extracted text snippets are examined for duplicates, which are then eliminated from further consideration.
  - 12. The method of claim 1, further comprising repeating the method of claim 1 on a periodic basis, with output of the method of claim 1 being provided to an interested party.
  - 13. The method of claim 1, further comprising sorting the identified documents according to a measure of their relevance.
  - 14. The method of claim 1, wherein the collection of documents includes documents taken from consumer-generated media.

15. A computer-implemented method, comprising:
- applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents;
  
  extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest;
  
  applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and
  
  identifying documents corresponding to the identified text snippets, wherein;
  
  i) the name-based text pattern algorithm excludes from consideration text snippets in which the given name appears but which in context does not refer to the entity of interest,ii) the topic-based text pattern algorithm specifies at least one unwanted text pattern that matches a topic other than the specific topic of interest, so that text snippets including said at least one unwanted text pattern are excluded from consideration,iii) each of the algorithms uses at least one regular expression, andiv) the given name is selected from the group consisting of trade names, brand names, and company names.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15, further comprising repeating the method of claim 15 on a periodic basis, with output of the method of claim 15 being provided to an interested party.
  - 17. The method of claim 16, comprising:
    - applying the name-based text pattern algorithm to extracted text snippets to determine which ones of them have a reference to the particular entity of interest, thereby forming a subset of the extracted text snippets; and
      
      applying the topic-based text pattern algorithm to text snippets in the subset of the extracted text snippets, thereby identifying text snippets having references to both i) the particular entity of interest and ii) the specific topic of interest, wherein each of the text snippets is no longer than 50 words.
  - 18. The method of claim 17, comprising identifying duplicates among the extracted text snippets, thereby forming a set of remaining text snippets to which the algorithms are applied.

19. At least one tangible computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
- applying at least one keyword-based query to a collection of documents to determine which ones of them are of potential interest, the documents of potential interest thereby forming a subset of the collection of documents;
  
  extracting text snippets from documents in the subset, each of the extracted text snippets including at least one term of interest;
  
  applying a name-based text pattern algorithm and a topic-based text pattern algorithm to text snippets extracted from documents in the subset, thereby identifying text snippets having references to both i) a particular entity of interest known by a given name and ii) a specific topic of interest; and
  
  identifying documents corresponding to the identified text snippets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chen, Ying, Griffin, Thomas D., Proctor, Larry L., Spangler, W. Scott, Behal, Amit

Granted Patent

US 8,010,524 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06Q 30/02 Marketing; Price estimation...

METHOD OF MONITORING ELECTRONIC MEDIA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

64 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

METHOD OF MONITORING ELECTRONIC MEDIA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others