Extraction of anchor explanatory text by mining repeated patterns
First Claim
1. A computing device with a computer-readable storage medium with instructions for identifying explanatory text for a display page, the computer-readable storage medium comprising:
- a find repeated patterns component that identifies repeated patterns of elements within a display page by comparing elements of the display page to other elements of the display page, a repeated pattern having an anchor along with text associated, the anchor being an element that includes a reference to a referenced display page, wherein patterns of elements are considered to be repeated when the patterns have the same number of elements and the patterns have an edit distance that is within a threshold;
a find dominant anchor component that finds a dominant anchor within a repeated pattern bywhen the repeated pattern includes multiple anchors,when only one anchor of the repeated pattern contains a block element and has text associated with the anchor, designating that anchor as the dominant anchor; and
when more than one anchor of the repeated pattern contains a block element and has text associated with the anchor, designating no anchor as the dominant anchor;
an extract text component that extracts from a repeated pattern the text associated with the dominant anchor, wherein the extracted text represents explanatory text for the referenced display page; and
whereina summarization component generates a summary of a display page from the explanatory text extracted by the extract text component.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for identifying explanatory text for a referenced web page based on a reference to the referenced web page contained in a repeated pattern of a referencing web page is provided. An anchor explanatory text (“AET”) system uses the hierarchical organization of the web page to identify a repeated pattern of hierarchical elements that contain references to other display pages. After the AET system identifies a repeated pattern, it identifies the dominant reference or anchor within each occurrence of the pattern. The AET system uses the explanatory text surrounding a dominant anchor as a description of the referenced web page.
-
Citations
20 Claims
-
1. A computing device with a computer-readable storage medium with instructions for identifying explanatory text for a display page, the computer-readable storage medium comprising:
-
a find repeated patterns component that identifies repeated patterns of elements within a display page by comparing elements of the display page to other elements of the display page, a repeated pattern having an anchor along with text associated, the anchor being an element that includes a reference to a referenced display page, wherein patterns of elements are considered to be repeated when the patterns have the same number of elements and the patterns have an edit distance that is within a threshold; a find dominant anchor component that finds a dominant anchor within a repeated pattern by when the repeated pattern includes multiple anchors, when only one anchor of the repeated pattern contains a block element and has text associated with the anchor, designating that anchor as the dominant anchor; and when more than one anchor of the repeated pattern contains a block element and has text associated with the anchor, designating no anchor as the dominant anchor; an extract text component that extracts from a repeated pattern the text associated with the dominant anchor, wherein the extracted text represents explanatory text for the referenced display page; and
whereina summarization component generates a summary of a display page from the explanatory text extracted by the extract text component. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method performed by a computer executing instructions of a computer program for identifying explanatory text for a display page, the method comprising:
-
identifying repeated patterns of elements within a display page by comparing elements of the display page to other elements of the display page, a repeated pattern having an anchor along with text associated, the anchor being an element that includes a reference to a referenced display page, wherein patterns of elements are considered to be repeated when the patterns have the same number of elements and the patterns have an edit distance that is within a threshold; finding a dominant anchor within a repeated pattern by when the repeated pattern includes multiple anchors, when only one anchor of the repeated pattern contains a block element and has text associated with the anchor, designating that anchor as the dominant anchor; and when more than one anchor of the repeated pattern contains a block element and has text associated with the anchor, designating no anchor as the dominant anchor; extracting from a repeated pattern the text associated with the dominant anchor, wherein the extracted text represents explanatory text for the referenced display page; and generating a summary of a display page based on the extracted explanatory text. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification