SELECTIVE CONTENT EXTRACTION
First Claim
Patent Images
1. A method for extracting web content, comprising:
- detecting, within a web page, a hierarchical structure that includes a plurality of nodes;
identifying potential article nodes from the plurality of nodes;
selecting as an article node one of the identified potential article nodes with a highest rank in the hierarchical structure; and
producing content extracted from the article node.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for extracting web content includes detecting, within a web page, a hierarchical structure that includes a plurality of nodes. Potential article nodes from the plurality of nodes are identified. The identified potential article node with a highest rank in the hierarchical structure is identified as an article node. Content is extracted from the article node.
52 Citations
15 Claims
-
1. A method for extracting web content, comprising:
-
detecting, within a web page, a hierarchical structure that includes a plurality of nodes; identifying potential article nodes from the plurality of nodes; selecting as an article node one of the identified potential article nodes with a highest rank in the hierarchical structure; and producing content extracted from the article node. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer readable medium having computer executable instructions recorded thereon, the instructions wherein executed cause a processing system to implement a method that includes:
-
detecting, within a web page, a hierarchical structure that includes a plurality of nodes; identifying potential article nodes from the plurality of nodes; selecting as an article node the identified potential article node with a highest rank in the hierarchical structure; and producing content extracted from the article node. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system for extracting web content, comprising:
-
a structure engine operable to detect, within a web page, a hierarchical structure that includes a plurality of nodes; an article engine operable to identify potential article nodes from the plurality of nodes and to select as an article node the identified potential article node with a highest rank in the hierarchical structure; and a production engine operable to produce content extracted from the article node. - View Dependent Claims (12, 13, 14, 15)
-
Specification