Probabilistic learning method for XML annotation of documents
First Claim
Patent Images
1. a document processor comprising:
- a classifier that classifies fragments of an input document respective to a set of terminal elements;
a probabilistic grammar defining transformation rules operating on elements selected from the set of terminal elements and a set of non-terminal elements; and
a parser that defines a parsed document structure associating the input document fragments with terminal elements connected by links of non-terminal elements conforming with the probabilistic grammar, the parsed document structure being used to organize the input document.
1 Assignment
0 Petitions
Accused Products
Abstract
A document processor includes a parser that parses a document using a grammar having a set of terminal elements for labeling leaves, a set of non terminal elements for labeling nodes, and a set of transformation rules. The parsing generates a parsed document structure including terminal element labels for fragments of the document and a nodes tree linking the terminal element labels and conforming with the transformation rules. An annotator-annotates the document with structural information based on the parsed document structure.
-
Citations
20 Claims
-
1. a document processor comprising:
-
a classifier that classifies fragments of an input document respective to a set of terminal elements;
a probabilistic grammar defining transformation rules operating on elements selected from the set of terminal elements and a set of non-terminal elements; and
a parser that defines a parsed document structure associating the input document fragments with terminal elements connected by links of non-terminal elements conforming with the probabilistic grammar, the parsed document structure being used to organize the input document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A document processing method comprising:
-
classifying fragments of an input document respective to a set of terminal elements;
parsing the classified fragments to determine a parsed document structure associating the fragments with terminal elements connected by links of non-terminal elements conforming with a probabilistic grammar; and
organizing the input document as an XML document with an XML structure conforming with the parsed document structure. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A document processor comprising:
-
a parser that parses a document using a grammar having a set of terminal elements for labeling leaves, a set of non-terminal elements for labeling nodes, and a set of transformation rules, the parsing generating a parsed document structure including terminal element labels for fragments of the document and a nodes tree linking the terminal element labels and conforming with the transformation rules; and
an annotator that annotates the document with structural information based on the parsed document structure. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification