Method and apparatus for generating structured document
First Claim
1. A method of generating a structured document for a structured document generating apparatus having at least an input/output device, a control unit, and a repository wherein a non-structured document not explicitly given the document structure and input from said input/output device is converted into a structured document explicitly given the document structure, in accordance with a document structure definition defining the document structure, said method comprising the steps of:
- modifying a given first document structure definition so as to match the document structure of said input non-structured document and generate a second document structure definition;
by said control unit, generating a parsing rule used for performing a parsing process suitable for the document structure of said second document structure definition, by modifying marks constituting said second document structure definition and modifying said second document structure definition so as to make the positional order of said marks in one-to-one correspondence;
in accordance with said generated parsing rule, generating a first structured document from said non-structured document; and
in accordance with difference data between said first document structure definition and said second document structure definition, converting said generated first structured document into a format matching said first document structure definition to thereby generate a second structured document.
1 Assignment
0 Petitions
Accused Products
Abstract
A structured document generating method and apparatus capable of easily generating a structured document matching the document structure of each non-structured document, by using a rule directly generated from a preset document structure definition for the conversion of the non-structured document into the structured document. A keyword extracting module extracts a keyword representative of the document structure from a non-structured document by using a keyword extracting rule, and a keyword/text model is generated which is described by two elements including keywords and other strings. A parsing module generated by a process of automatically parsing the document structure by referring to a parsing rule generated by modifying and converting DTD, performs a parsing process relative to the keyword/text model to generate an interim SGML document. An SGML document correcting module modifies the interim SGML document and generates a final output of an SGML document by referring to DTD different information generated when the parsing rule was generated.
172 Citations
9 Claims
-
1. A method of generating a structured document for a structured document generating apparatus having at least an input/output device, a control unit, and a repository wherein a non-structured document not explicitly given the document structure and input from said input/output device is converted into a structured document explicitly given the document structure, in accordance with a document structure definition defining the document structure, said method comprising the steps of:
-
modifying a given first document structure definition so as to match the document structure of said input non-structured document and generate a second document structure definition; by said control unit, generating a parsing rule used for performing a parsing process suitable for the document structure of said second document structure definition, by modifying marks constituting said second document structure definition and modifying said second document structure definition so as to make the positional order of said marks in one-to-one correspondence; in accordance with said generated parsing rule, generating a first structured document from said non-structured document; and in accordance with difference data between said first document structure definition and said second document structure definition, converting said generated first structured document into a format matching said first document structure definition to thereby generate a second structured document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A storage device storing a program realizing a process executable by a computer, the process comprising the steps of:
-
modifying a given first document structure definition so as to match the document structure of an input non-structured document and generate a second document structure definition; a control unit generating a parsing rule used for performing a parsing process suitable for the document structure of said second document structure definition, by modifying marks constituting said second document structure definition and modifying said second document structure definition so as to make the positional order of said marks in one-to-one correspondence; in accordance with said generated parsing rule, generating a first structured document from said input non-structured document; and in accordance with difference data between said first document structure definition and said second document structure definition, converting said generated first structured document into a format matching said first document structure definition to thereby generate a second structured document.
-
Specification