Eliminating noise in periodicals
First Claim
Patent Images
1. A method, comprising:
- preprocessing, by a server computer system, each item in a set of items using one or more rules, wherein preprocessing an item in the set of items comprises;
determining that the item includes a print option; and
using a version of the item associated with the print option instead of an alternate version;
removing, by the server computer system, global noise from the set of items using semantic similarities across items in the set of items; and
removing, by the server computer system, local noise in the item of the set of items, wherein removing the local noise in the item of the set of items comprises;
determining an amount of content for a node associated with the item;
calculating a content score for the node based on the amount of content;
calculating a link density for the node based on a number of links in the node as a percentage of the content;
calculating a local noise score for the node based on the content score and the link density; and
removing the node responsive to a determination that the local noise score is above a threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for eliminating global and local noise in periodical items is described. An exemplary method may include preprocessing each item in a set of items using one or more rules, removing global noise from the set of items using semantic similarities across items in the set of items, and removing local noise in each item in the set of items based on text content in each item.
17 Citations
23 Claims
-
1. A method, comprising:
-
preprocessing, by a server computer system, each item in a set of items using one or more rules, wherein preprocessing an item in the set of items comprises; determining that the item includes a print option; and using a version of the item associated with the print option instead of an alternate version; removing, by the server computer system, global noise from the set of items using semantic similarities across items in the set of items; and removing, by the server computer system, local noise in the item of the set of items, wherein removing the local noise in the item of the set of items comprises; determining an amount of content for a node associated with the item; calculating a content score for the node based on the amount of content; calculating a link density for the node based on a number of links in the node as a percentage of the content; calculating a local noise score for the node based on the content score and the link density; and removing the node responsive to a determination that the local noise score is above a threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 21, 22, 23)
-
-
11. A non-transitory computer readable storage medium including instructions that, when executed by a processor, cause the processor to perform operations comprising:
-
preprocessing, by the processor, each item in a set of items using one or more rules, wherein preprocessing an item in the set of items comprises; determining that the item includes a print option; and using a version of the item associated with the print option instead of an alternate version; removing, by the processor, global noise from the set of items using semantic similarities across items in the set of items; and removing, by the processor, local noise in the item of the set of items, wherein removing the local noise in the item of the set of items comprises; determining an amount of content for a node associated with the item; calculating a content score for the node based on the amount of content; calculating a link density for the node based on a number of links in the node as a percentage of the content; calculating a local noise score for the node based on the content score and the link density; and removing the node responsive to a determination that the local noise score is above a threshold. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A server computer system comprising:
-
a memory; and a processor, coupled to the memory, the processor to; preprocess each item in a set of items using one or more rules, wherein to preprocess an item in the set of items the processor is to; determine that the item includes a print option; and use a version of the item associated with the print option instead of an alternate version; remove global noise from the set of items using semantic similarities across items in the set of items; and remove local noise in the item of the set of items, wherein to remove the local noise in an item of the set of items the processor is to; determine an amount of content for a node associated with the item; calculate a content score for the node based on the amount of content; calculate a link density for the node based on a number of links in the node as a percentage of the content; calculate a local noise score for the node based on the content score and the link density; and remove the node responsive to a determination that the local noise score is above a threshold.
-
Specification