×

Document information processing apparatus

  • US 7,269,789 B2
  • Filed: 03/23/2004
  • Issued: 09/11/2007
  • Est. Priority Date: 04/10/2003
  • Status: Expired due to Fees
First Claim
Patent Images

1. A document information processing apparatus comprising:

  • a plain document input unit for inputting a plain document;

    a dictionary storage unit for storing a dictionary used for form element analysis and syntactic analysis;

    a form element analyzer for performing a form element analysis on the plain document inputted from said plain document input unit by using the dictionary stored in said dictionary storage unit so as to decompose the plain document into tokens;

    a syntax analyzer for analyzing a part of speech of each of the tokens obtained by said form element analyzer based on a syntax of said plain document so as to generate a structured document containing meaningful words;

    a data storage unit for storing data used for a markup process;

    an element refinement processing unit for performing the markup process of reading each of the meaningful words in the structured document and automatically adding content to the structured document in association with at least one of the meaningful words in order to generate a markup document; and

    a markup document output unit for outputting the markup document generated by said element refinement processing unit,whereinthe added content is different from the markup tags in the markup document, andthe added content includes at least one of;

    data, which is related to at least one of the meaningful words and which is read from the data storage unit, anddata, which is related to at least one of the meaningful words and which is generated according to a determined attribute of the at least one of the meaningful words.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×