Signal-processing based approach to translation of web pages into wireless pages
First Claim
Patent Images
1. A method for transforming a hypermedia document containing main content and auxiliary data, the method comprising:
- converting the hypermedia document into a string containing a plurality of first values and a plurality of second values, the plurality of first values replacing a plurality of formatting code segments within the hypermedia document and the plurality of second values replacing a plurality of text segments within the hypermedia document;
applying a low-pass filter to the string containing the plurality of first values and the plurality of second values; and
determining a location of the main content within the hypermedia document using an output of the low-pass filter.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for transforming a web page that contains main content and auxiliary data. The web page is converted into a string containing multiple first values and multiple second values. The first values correspond to formatting code segments within the web page and the second values correspond to text segments within the web page. Further, a low-pass filter is applied to the string containing multiple first values and multiple second values, and the output of the low-pass filter is used to determine the location of the main content within the web page.
26 Citations
39 Claims
-
1. A method for transforming a hypermedia document containing main content and auxiliary data, the method comprising:
-
converting the hypermedia document into a string containing a plurality of first values and a plurality of second values, the plurality of first values replacing a plurality of formatting code segments within the hypermedia document and the plurality of second values replacing a plurality of text segments within the hypermedia document; applying a low-pass filter to the string containing the plurality of first values and the plurality of second values; and determining a location of the main content within the hypermedia document using an output of the low-pass filter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented apparatus for transforming a hypermedia document containing main content and auxiliary data, the apparatus comprising:
-
a converter to convert the hypermedia document into a string containing a plurality of first values and a plurality of second values, the plurality of first values replacing a plurality of formatting code segments within the hypermedia document and the plurality of second values replacing a plurality of text segments within the hypermedia document; a low-pass filter to apply to the string containing the plurality of first values and the plurality of second values; and a location calculator to determine a location of the main content within the hypermedia document using an output of the low-pass filter. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A medium readable by a machine, the medium having stored thereon a sequence of instructions which, when executed by the machine, cause the machine to:
-
convert the hypermedia document into a string containing a plurality of first values and a plurality of second values, the plurality of first values replacing a plurality of formatting code segments within the hypermedia document and the plurality of second values replacing a plurality of text segments within the hypermedia document; apply a low-pass filter to the string containing the plurality of first values and the plurality of second values; and determine a location of the main content within the hypermedia document using a low-pass filter output.
-
-
32. A method for transforming a web page containing main content and auxiliary data, the method comprising:
-
converting the web page into a string containing a plurality of first values and a plurality of second values, the plurality of first values corresponding to a plurality of formatting code segments within the web page and the plurality of second values corresponding to a plurality of text segments within the web page; applying a moving average filter to the string containing the plurality of first values and the plurality of second values to generate an output representing a distribution of text density over the web page; searching the output of the moving average filter to find a position of a central peak corresponding to the highest text density within the web page; determining a starting position of a high text density area and an ending position of the high text density area using the position of the central peak and a threshold text density value to determine a location of the main content within the web page; and coding the main content in a mobile device language for display on a mobile device. - View Dependent Claims (33, 34, 35)
-
-
36. A method for transforming a web page containing main content and auxiliary data, the method comprising:
-
converting the web page into a string containing a plurality of first values and a plurality of second values, the plurality of first values corresponding to a plurality of formatting code segments within the web page and the plurality of second values corresponding to a plurality of text segments within the web page; applying a median filter to the string containing the plurality of first values and the plurality of second values to suppress high frequency signal oscillations associated with the string; applying a moving average filter to an output of the median filter to combine a plurality of closely spaced text segments contained in the output of the median filter into a set of larger text segments; applying a rising and falling edge detector to an output of the median filter to identify the largest reasonably contiguous text segment within the set of larger segments using a threshold text value, the largest reasonably contiguous text segment corresponding to the main content of the web page; and coding the main content in a mobile device language for display on a mobile device. - View Dependent Claims (37, 38, 39)
-
Specification