×

Document information processing apparatus

  • US 20040205670A1
  • Filed: 03/23/2004
  • Published: 10/14/2004
  • Est. Priority Date: 04/10/2003
  • Status: Active Grant
First Claim
Patent Images

1. A document information processing apparatus comprising:

  • a plain document input unit for inputting a plain document;

    a dictionary storage unit for storing a dictionary used for form element analysis and syntactic analysis;

    a form element analyzer for performing a form element analysis on the plain document inputted from said plain document input unit by using the dictionary stored in said dictionary storage unit so as to decompose the plain document into tokens;

    a syntax analyzer for analyzing a part of speech of each of the tokens obtained by said form element analyzer based on a syntax of said plain document so as to generate a structured document containing meaningful words;

    a data storage unit for storing data used for a markup process;

    an element refinement processing unit for performing the markup process of reading and adding data associated with each of the meaningful words included in the structured document generated by said syntax analyzer and stored in said data storage unit to each of the meaningful words so as to generate a markup document; and

    a markup document output unit for outputting the markup document generated by said element refinement processing unit.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×