Document information processing apparatus
First Claim
1. A document information processing apparatus comprising:
- a plain document input unit for inputting a plain document;
a dictionary storage unit for storing a dictionary used for form element analysis and syntactic analysis;
a form element analyzer for performing a form element analysis on the plain document inputted from said plain document input unit by using the dictionary stored in said dictionary storage unit so as to decompose the plain document into tokens;
a syntax analyzer for analyzing a part of speech of each of the tokens obtained by said form element analyzer based on a syntax of said plain document so as to generate a structured document containing meaningful words;
a data storage unit for storing data used for a markup process;
an element refinement processing unit for performing the markup process of reading and adding data associated with each of the meaningful words included in the structured document generated by said syntax analyzer and stored in said data storage unit to each of the meaningful words so as to generate a markup document; and
a markup document output unit for outputting the markup document generated by said element refinement processing unit.
1 Assignment
0 Petitions
Accused Products
Abstract
A document information processing apparatus includes a form element analyzer (12) for performing a form element analysis on a plain document inputted from a plain document input unit (10) by using a dictionary stored in a dictionary storage unit so as to decompose the plain document into tokens, a syntax analyzer (13) for analyzing the part of speech of each of the tokens obtained by the form element analyzer so as to generate a structured document containing meaningful words, an element refinement processing unit (15) for performing a markup process of adding data associated with each of the meaningful words included in the structured document generated by the syntax analyzer and stored in a data storage unit (14) to each of the meaningful words so as to generate a markup document, and a markup document output unit (17) for outputting the markup document generated by the element refinement processing unit.
-
Citations
16 Claims
-
1. A document information processing apparatus comprising:
- a plain document input unit for inputting a plain document;
a dictionary storage unit for storing a dictionary used for form element analysis and syntactic analysis;
a form element analyzer for performing a form element analysis on the plain document inputted from said plain document input unit by using the dictionary stored in said dictionary storage unit so as to decompose the plain document into tokens;
a syntax analyzer for analyzing a part of speech of each of the tokens obtained by said form element analyzer based on a syntax of said plain document so as to generate a structured document containing meaningful words;
a data storage unit for storing data used for a markup process;
an element refinement processing unit for performing the markup process of reading and adding data associated with each of the meaningful words included in the structured document generated by said syntax analyzer and stored in said data storage unit to each of the meaningful words so as to generate a markup document; and
a markup document output unit for outputting the markup document generated by said element refinement processing unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- a plain document input unit for inputting a plain document;
Specification