Apparatus and method for abstracting markup language documents
First Claim
1. A computer-aided method to generate a hyperlinked abstract from a markup language document comprising the steps of:
- parsing the markup language document to generate a syntax tree with a number of nodes;
analyzing statistically the syntax tree;
generating an annotated syntax tree including collected statistical information responsive to the step of analyzing;
classifying each node into a predefined category based on the collected statistical information of the annotated syntax tree; and
, summarizing the classified nodes to create a hyperlinked abstract of the document to be presented at an output device.
6 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and a method to generate a hyperlinked abstract from a markup language document by parsing the document to create a syntax tree, analyzing statistically the syntax tree based on at least one rule, classifying information at each node of the syntax tree, adapting information at each node of the classified tree for outputting and summarizing the adapted tree to create a hyperlinked abstract of the document to be presented at an output device. The abstract can be considered as a summarized version of the document. It occupies less bandwidth than the document, allowing it to be transmitted to a user at a much faster pace, even if the user'"'"'s computing system and connection are not very sophisticated. Through the abstract, the user can quickly become aware of the coverage of the document. If more detailed information is preferred, the user can access those materials in the document through hyperlinks. In one embodiment, the summarization step includes grouping, in which a pre-determined number of nodes are grouped together. In another embodiment, after summarization, the tree can be modified by an output-specific filter, and can be sent to an output device.
133 Citations
22 Claims
-
1. A computer-aided method to generate a hyperlinked abstract from a markup language document comprising the steps of:
-
parsing the markup language document to generate a syntax tree with a number of nodes;
analyzing statistically the syntax tree;
generating an annotated syntax tree including collected statistical information responsive to the step of analyzing;
classifying each node into a predefined category based on the collected statistical information of the annotated syntax tree; and
,summarizing the classified nodes to create a hyperlinked abstract of the document to be presented at an output device. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for generating a hyperlinked abstract from a markup language document comprising:
-
a parser configured to parse the markup document to generate a syntax tree of the document with a number of nodes;
a statistical analyzer configured to analyze statistically the syntax tree based on at least one rule and to generate an annotated syntax tree including collected statistical information;
a classifier configured to classify each node into a pre-defined category based on the collected statistical information of the annotated syntax tree; and
,a summarizer configured to summarize the classified nodes to create a hyperlinked abstract of the document to be presented at an output device. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-aided method to generate a hyperlinked abstract from a markup language document comprising the steps of:
-
parsing the markup language document to generate a syntax tree with a number of nodes arranged in a number of levels;
analyzing statistically the syntax tree;
classifying each node into a predefined category; and
,summarizing the classified nodes to create a hyperlinked abstract of the document to be presented at an output device;
wherein the step of summarizing includes reducing the number of levels of the syntax tree by grouping a plurality of nodes together.- View Dependent Claims (16, 17, 18, 19)
-
-
20. An apparatus for generating a hyperlinked abstract from a markup language document comprising:
-
a parser configured to parse the markup document to generate a syntax tree of the document with a number of nodes arranged in a number of levels;
a statistical analyzer configured to analyze statistically the syntax tree based on at least one rule;
a classifier configured to classify each node into a pre-defined category; and
,a summarizer configured to summarize the classified nodes to create a hyperlinked abstract of the document to be presented at an output device;
wherein the summarizer reduces the number of levels of the syntax tree by grouping a plurality of nodes together.- View Dependent Claims (21, 22)
-
Specification