Detection and handling of aggregated online content using decision criteria to compare similar or identical content items

US 9,697,287 B2
Filed: 10/09/2015
Issued: 07/04/2017
Est. Priority Date: 09/14/2012
Status: Active Grant

First Claim

Patent Images

1. A method for evaluating online content items, the method comprising:

acquiring a first online content item from an online source, by a computer system using conventional webcrawling techniques, wherein the first content item is obtained via network connection;

generating, by the computer system, a characterizing signature for the first content item;

searching a cache memory architecture of the computer system for an instance of the characterizing signature;

identifying, by the computer system, first RSS feed data associated with the first online content item and second RSS feed data associated with a second online content item, wherein the second RSS feed data is identified from the cache memory architecture, and wherein the second online content item corresponds to the instance of the characterizing signature saved in the cache memory architecture;

evaluating, by the computer system, the first RSS feed data and the second RSS feed data; and

determining, by the computer system, whether the first online content item or the second online content item comprises a content aggregator, based on the evaluating, wherein the content aggregator comprises a website presenting a duplicate version of original online content obtained from legitimate sources of the original online content.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method is presented herein. The method obtains a first content item from an online source, and then generates a characterizing signature of the first content item. The method continues by finding a previously-saved instance of the characterizing signature and retrieving data associated with a second content item (the second content item is characterized by the characterizing signature). The method continues by analyzing the data associated with the second content item, corresponding data associated with the first content item, and decision criteria. Thereafter, either the first content item or the second content item is identified as an original content item, based on the analyzing. The other content item can be flagged as an aggregated content item.

157 Citations

20 Claims

1. A method for evaluating online content items, the method comprising:
- acquiring a first online content item from an online source, by a computer system using conventional webcrawling techniques, wherein the first content item is obtained via network connection;
  
  generating, by the computer system, a characterizing signature for the first content item;
  
  searching a cache memory architecture of the computer system for an instance of the characterizing signature;
  
  identifying, by the computer system, first RSS feed data associated with the first online content item and second RSS feed data associated with a second online content item, wherein the second RSS feed data is identified from the cache memory architecture, and wherein the second online content item corresponds to the instance of the characterizing signature saved in the cache memory architecture;
  
  evaluating, by the computer system, the first RSS feed data and the second RSS feed data; and
  
  determining, by the computer system, whether the first online content item or the second online content item comprises a content aggregator, based on the evaluating, wherein the content aggregator comprises a website presenting a duplicate version of original online content obtained from legitimate sources of the original online content.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein evaluating the first RSS feed data and the second RSS feed data further comprises:
    - identifying a first volume of updates to the first RSS feed data and a second volume of updates to the second RSS feed data; and
      
      comparing the first volume and the second volume to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when one of the first volume and the second volume is higher than a threshold, designating an associated one of the first online content item and the second online content item as a content aggregator.
  - 3. The method of claim 1, wherein evaluating the first RSS feed data and the second RSS feed data further comprises:
    - identifying a first update frequency to the first RSS feed data and a second update frequency to the second RSS feed data; and
      
      comparing the first update frequency and the second update frequency to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when one of the first update frequency and the second update frequency is higher than the predetermined threshold, designating an associated one of the first online content item and the second online content item as a content aggregator.
  - 4. The method of claim 1, wherein evaluating the first RSS feed data and the second RSS feed data further comprises:
    - identifying a first group of outbound links associated with the first RSS feed data and a second group of outbound links associated with the second RSS feed data, the first group and the second group comprising links to revenue-generating webpages; and
      
      comparing a first count of the first group and a second count of the second group to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when one of the first count and the second count is higher than the predetermined threshold, designating an associated one of the first online content item and the second online content item as a content aggregator.
  - 5. The method of claim 1, wherein evaluating the first RSS feed data and the second RSS feed data further comprises:
    - obtaining a first publication date for the first online content item and a second publication date for the second online content item, wherein the first publication date is obtained from the first RSS feed data and the second publication date is obtained from the second RSS feed data; and
      
      comparing the first publication date and the second publication date to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when one of the first publication date and the second publication date is more recent than the predetermined threshold, designating an associated one of the first online content item and the second online content item as a content aggregator.
  - 6. The method of claim 1, wherein evaluating the first RSS feed data and the second RSS feed data further comprises:
    - obtaining a first publication date for the first online content item and a second publication data for the second online content item, wherein the first publication date is obtained from the first RSS feed data and the second publication date is obtained from the second RSS feed data; and
      
      comparing the first publication date to the second publication date;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when the first publication date is more recent than the second publication date, designating the first online content item as a content aggregator.
  - 7. The method of claim 1, wherein evaluating the first RSS feed data and the second RSS feed data further comprises:
    - obtaining first authorship data for the first online content item and second authorship data for the second online content item, wherein the first authorship data is obtained from the first RSS feed data and the second authorship data is obtained from the second RSS feed data; and
      
      comparing the first authorship data and the second authorship data to a predefined list of authorship terms associated with aggregated content, to locate a match;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when one of the first authorship data and the second authorship data matches a subset of the predefined list of authorship terms, designating an associated one of the first online content item and the second online content item as a content aggregator.
  - 8. The method of claim 1, further comprising:
    - obtaining the first online content item from an online source;
      
      generating a characterizing signature for the first online content item;
      
      finding a previously-saved instance of the characterizing signature;
      
      retrieving the second RSS feed data associated with the second online content item, in response to finding the previously-saved instance of the characterizing signature, wherein the second online content item is characterized by the characterizing signature;
      
      analyzing the second RSS feed data associated with the second online content item, the first RSS feed data associated with the first online content item, and decision criteria; and
      
      identifying either the first online content item or the second online content item as an original content item, based on the analyzing.

9. A computing system for evaluating online content items, the computing system comprising:
- system memory comprising a cache memory architecture configured to store instances of characterizing signatures associated with online content items; and
  
  at least one processor, communicatively coupled to the system memory, the at least one processor configured to;
  
  acquire a first online content item from an online source, using conventional webcrawling techniques, wherein the first content item is obtained via network connection;
  
  generate a characterizing signature for the first content item;
  
  search a cache memory architecture of the computer system for an instance of the characterizing signature;
  
  identify first RSS feed data associated with the first online content item and second RSS feed data associated with a second online content item, wherein the second RSS feed data is identified from the cache memory architecture, and wherein the second online content item corresponds to the instance of the characterizing signature saved in the cache memory architecture;
  
  assess first RSS feed data associated with a first online content item and second RSS feed data associated with a second online content item; and
  
  determine whether the first online content item or the second online content item comprises a content aggregator, based on the assessment, wherein the content aggregator comprises a website presenting a duplicate version of original online content obtained from legitimate sources of the original online content.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The computing system of claim 9, wherein the at least one processor is further configured to:
    - identify a first volume of updates to the first RSS feed data and a second volume of updates to the second RSS feed data; and
      
      compare the first volume and the second volume to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item comprises a content aggregator further comprises;
      
      when one of the first volume and the second volume is higher than a threshold, designate an associated one of the first online content item and the second online content item as a content aggregator.
  - 11. The computing system of claim 9, wherein the at least one processor is further configured to:
    - identify a first update frequency to the first RSS feed data and a second update frequency to the second RSS feed data; and
      
      compare the first update frequency and the second update frequency to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item comprises a content aggregator further comprises;
      
      when one of the first update frequency and the second update frequency is higher than the predetermined threshold, designate an associated one of the first online content item and the second online content item as a content aggregator.
  - 12. The computing system of claim 9, wherein the at least one processor is further configured to:
    - identify a first group of outbound links associated with the first RSS feed data and a second group of outbound links associated with the second RSS feed data, the first group and the second group comprising links to revenue-generating webpages; and
      
      compare a first count of the first group and a second count of the second group to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item comprises a content aggregator further comprises;
      
      when one of the first count and the second count is higher than the predetermined threshold, designate an associated one of the first online content item and the second online content item as a content aggregator.
  - 13. The computing system of claim 9, wherein the at least one processor is further configured to:
    - obtain a first publication date for the first online content item and a second publication data for the second online content item, wherein the first publication date is obtained from the first RSS feed data and the second publication date is obtained from the second RSS feed data; and
      
      compare the first publication date and the second publication date to a predetermined threshold;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when one of the first publication date and the second publication date is more recent than the predetermined threshold, designate an associated one of the first online content item and the second online content item as a content aggregator.
  - 14. The computing system of claim 9, wherein the at least one processor is further configured to:
    - obtain a first publication date for the first online content item and a second publication data for the second online content item, wherein the first publication date is obtained from the first RSS feed data and the second publication date is obtained from the second RSS feed data; and
      
      compare the first publication date to the second publication date;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when the first publication date is more recent than the second publication date, designate the first online content item as a content aggregator.
  - 15. The computing system of claim 9, wherein the at least one processor is further configured to:
    - obtain first authorship data for the first online content item and second authorship data for the second online content item, wherein the first authorship data is obtained from the first RSS feed data and the second authorship data is obtained from the second RSS feed data; and
      
      compare the first authorship data and the second authorship data to a predefined list of authorship terms associated with aggregated content, to locate a match;
      
      wherein determining whether the first online content item or the second online content item is a content aggregator further comprises;
      
      when one of the first authorship data and the second authorship data matches a subset of the predefined list of authorship terms, designate an associated one of the first online content item and the second online content item as a content aggregator.

16. A non-transitory, computer-readable medium containing instructions thereon, which, when executed by a processor, are capable of performing a method comprising:
- acquiring a first online content item from an online source, by the processor using conventional webcrawling techniques, wherein the first content item is obtained via network connection;
  
  generating, by the processor, a characterizing signature for the first content item;
  
  searching a cache memory architecture communicatively coupled to the processor for an instance of the characterizing signature;
  
  identifying, by the processor, first RSS feed data associated with the first online content item and second RSS feed data associated with a second online content item, wherein the second RSS feed data is identified from the cache memory architecture, and wherein the second online content item corresponds to the instance of the characterizing signature saved in the cache memory architecture;
  
  evaluating RSS feed data associated with a plurality of online content items; and
  
  identifying, by the processor, at least one of the plurality of online content items as a content aggregator, based on the evaluating, wherein the content aggregator comprises a website presenting a duplicate version of original online content obtained from legitimate sources of the original online content.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The non-transitory, computer-readable medium of claim 16, wherein evaluating RSS feed data associated with a plurality of online content items further comprises:
    - identifying a first update frequency to first RSS feed data associated with a first online content item, the plurality of online content items comprising the first online content item; and
      
      comparing the first update frequency to a predetermined threshold;
      
      wherein identifying at least one of the plurality of online content items as a content aggregator further comprises;
      
      when the first update frequency is higher than the predetermined threshold, designating the first online content item as a content aggregator.
  - 18. The non-transitory, computer-readable medium of claim 16, wherein evaluating RSS feed data associated with a plurality of online content items further comprises:
    - identifying a first group of outbound links associated with first RSS feed data associated with a first online content item, wherein the plurality of online content items comprises the first online content item, and wherein the first group comprises links to revenue-generating webpages; and
      
      comparing a count of the first group to a predetermined threshold;
      
      wherein identifying at least one of the plurality of online content items as a content aggregator further comprises;
      
      when the first count is higher than the predetermined threshold, designating the first online content item as a content aggregator.
  - 19. The non-transitory, computer-readable medium of claim 16, wherein evaluating RSS feed data associated with a plurality of online content items further comprises:
    - obtaining a first publication date for a first online content item, wherein the first publication date is obtained from first RSS feed data associated with the first online content item, and wherein the plurality of online content items comprises the first online content item; and
      
      comparing the first publication date to a predetermined threshold;
      
      wherein identifying at least one of the plurality of online content items as a content aggregator further comprises;
      
      when the first publication date is more recent than the predetermined threshold, designating the first online content item as a content aggregator.
  - 20. The non-transitory, computer-readable medium of claim 16, wherein evaluating RSS feed data associated with a plurality of online content items further comprises:
    - obtaining first authorship data for a first online content item, wherein the first authorship data is obtained from first RSS feed data associated with the first online content item, and wherein the plurality of online content items comprises the first online content item; and
      
      comparing the first authorship data to a predefined list of authorship terms associated with aggregated content, to locate a match;
      
      wherein identifying at least one of the plurality of online content items as a content aggregator further comprises;
      
      when the first authorship data matches a subset of the predefined list of authorship terms, designating the first online content item as a content aggregator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Salesforce.com, Inc.
Original Assignee
Salesforce.com, Inc.
Inventors
Doan, Dai Duong
Primary Examiner(s)
Moorthy, Aravind

Application Number

US14/879,676
Publication Number

US 20160034581A1
Time in Patent Office

634 Days
Field of Search

726 32, 726 23, 713154, 713168, 713180
US Class Current
CPC Class Codes

G06F 16/24568   Data stream processing; Con...

G06F 16/951   Indexing; Web crawling tech...

G06F 21/10   Protecting distributed prog...

G06F 21/60   Protecting data

G06F 21/64   Protecting data integrity, ...

G11B 20/0021   involving encryption or dec...

H04L 43/08   Monitoring or testing based...

H04L 67/02   based on web technology, e....

H04L 67/568   Storing data temporarily at...

H04L 69/28   Timers or timing mechanisms...

H04L 9/08   Key distribution or managem...

H04L 9/3247   involving digital signatures

Detection and handling of aggregated online content using decision criteria to compare similar or identical content items

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

157 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Detection and handling of aggregated online content using decision criteria to compare similar or identical content items

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

157 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links