Detection and handling of aggregated online content using decision criteria to compare similar or identical content items
First Claim
1. A computer-implemented method comprising:
- obtaining, at a computer system, a first content item from an online source, wherein the first content item is obtained via network connection;
generating a characterizing signature of the first content item, by;
selecting a quantity of text for analysis, the first content item comprising the quantity of text;
eliminating filler words from the quantity of text to identify a plurality of significant words;
arranging a predetermined number of the plurality of significant words from the quantity of text to create a document key;
applying a hash function to the document key to obtain a hashed document key; and
appending a language identifier to the hashed document key to create the characterizing signature;
finding a previously-saved instance of the characterizing signature in a cache memory architecture of the computer system;
retrieving, from the cache memory architecture, data associated with a second content item, in response to finding the previously-saved instance of the characterizing signature, wherein the second content item is characterized by the characterizing signature;
analyzing the data associated with the second content item, corresponding data associated with the first content item, and decision criteria; and
identifying either the first content item or the second content item as an original content item, based on the analyzing.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method is presented herein. The method obtains a first content item from an online source, and then generates a characterizing signature of the first content item. The method continues by finding a previously-saved instance of the characterizing signature and retrieving data associated with a second content item (the second content item is characterized by the characterizing signature). The method continues by analyzing the data associated with the second content item, corresponding data associated with the first content item, and decision criteria. Thereafter, either the first content item or the second content item is identified as an original content item, based on the analyzing. The other content item can be flagged as an aggregated content item.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining, at a computer system, a first content item from an online source, wherein the first content item is obtained via network connection; generating a characterizing signature of the first content item, by; selecting a quantity of text for analysis, the first content item comprising the quantity of text; eliminating filler words from the quantity of text to identify a plurality of significant words; arranging a predetermined number of the plurality of significant words from the quantity of text to create a document key; applying a hash function to the document key to obtain a hashed document key; and appending a language identifier to the hashed document key to create the characterizing signature; finding a previously-saved instance of the characterizing signature in a cache memory architecture of the computer system; retrieving, from the cache memory architecture, data associated with a second content item, in response to finding the previously-saved instance of the characterizing signature, wherein the second content item is characterized by the characterizing signature; analyzing the data associated with the second content item, corresponding data associated with the first content item, and decision criteria; and identifying either the first content item or the second content item as an original content item, based on the analyzing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method comprising:
-
generating, at a computer system, a first characterizing signature of a first content item, by; selecting a quantity of text for analysis, the first content item comprising the quantity of text; eliminating filler words from the quantity of text to identify a plurality of significant words; arranging a predetermined number of the plurality of significant words from the quantity of text to create a document key; applying a hash function to the document key to obtain a hashed document key; and appending a language identifier to the hashed document key to create the first characterizing signature; determining, at the computer system, that the first characterizing signature of a first content item matches a second characterizing signature of a second content item, wherein the second characterizing signature has been previously saved in a memory element of the computer system; comparing an update frequency associated with a source of the first content item against predetermined update frequency criteria to determine whether the first content item is aggregated content; when the first content item is not determined to be aggregated content based on the predetermined update frequency criteria, analyzing one or more outbound links associated with a source of the first content item in view of predetermined outbound link criteria, to determine whether the first content item is aggregated content; when the first content item is not determined to be aggregated content based on the predetermined outbound link criteria, checking a publication date of the first content item relative to predetermined publication date criteria, to determine whether the first content item is aggregated content; and when the first content item is not determined to be aggregated content based on the predetermined publication date criteria, reviewing an identified author of the first content item in view of predetermined authorship criteria, to determine whether the first content item is aggregated content. - View Dependent Claims (11, 12)
-
-
13. A computing system comprising a processor and a memory having computer-executable instructions stored thereon that, when executed by the processor, cause the computing system to:
-
obtain a first content item from an online source, wherein the first content item is obtained via network connection; generate a characterizing signature of the first content item, by; selecting a quantity of text for analysis, the first content item comprising the quantity of text; eliminating filler words from the quantity of text to identify a plurality of significant words; arranging a predetermined number of the plurality of significant words from the quantity of text to create a document key; applying a hash function to the document key to obtain a hashed document key; and appending a language identifier to the hashed document key to create the characterizing signature; find a previously-saved instance of the characterizing signature in a cache memory architecture of the computing system; retrieve, from the cache memory architecture, data associated with a second content item, in response to finding the previously-saved instance of the characterizing signature, wherein the second content item is characterized by the characterizing signature; analyze the data associated with the second content item, corresponding data associated with the first content item, and decision criteria; and identify either the first content item or the second content item as an original content item, based on the analyzing. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A tangible and non-transitory computer readable medium having computer-executable instructions stored thereon that, when executed by a processor, perform a method comprising:
-
obtaining, at a computer system, a first content item from an online source, wherein the first content item is obtained via network connection; generating a characterizing signature of the first content item, by; selecting a quantity of text for analysis, the first content item comprising the quantity of text; eliminating filler words from the quantity of text to identify a plurality of significant words; arranging a predetermined number of the plurality of significant words from the quantity of text to create a document key; applying a hash function to the document key to obtain a hashed document key; and appending a language identifier to the hashed document key to create the characterizing signature; finding a previously-saved instance of the characterizing signature in a cache memory architecture of the computer system; retrieving data associated with a second content item from the cache memory architecture, in response to finding the previously-saved instance of the characterizing signature, wherein the second content item is characterized by the characterizing signature; analyzing the data associated with the second content item, and corresponding data associated with the first content item; and identifying either the first content item or the second content item as an original content item, based on the analyzing. - View Dependent Claims (20)
-
Specification