Document processing apparatus, document processing method, document processing program and recording medium
First Claim
1. A document processing apparatus comprising:
- block dividing means for dividing input document data into blocks in a predetermined manner according to the structure of the document data;
document structuring means for structuring the document data by parsing a block into which the document data is divided by said block dividing means according to the document structure of the block, and by adding tag information to text data constituting the block, said tag information indicating an attribute of the text data;
storing means for storing an extraction mode for determining whether or not a sentence is extracted with respect to a tag information added to text data; and
sentence extraction means for controlling an extraction of the text data according to the tag information added to the text data by said document structuring means and said extraction mode stored in said storing means, wherein said sentence extraction means further includes tag determining means for taking an action according to the tag information, and said tag determining means determines a type of tag information for the text data tagged with tag information according to the tag information and tag action data including action data indicating an action associated with the tag information, and takes the action set for the tag action data.
0 Assignments
0 Petitions
Accused Products
Abstract
The text format of input data is checked, and is converted into a system-manipulated format. It is further determined if the input data is in an HTML or e-mail format using tags, heading information, and the like. The converted data is divided into blocks in a simple manner such that elements in the blocks can be checked based on repetition of predetermined character patterns. Each block section is tagged with a tag indicating a block. The data divided into blocks is parsed based on tags, character patterns, etc., and is structured. A table in text is also parsed, and is segmented into cells. Finally, tree-structured data having a hierarchical structure is generated based on the sentence-structured data. A sentence-extraction template paired with the tree-structured data is used to extract sentences.
-
Citations
50 Claims
-
1. A document processing apparatus comprising:
-
block dividing means for dividing input document data into blocks in a predetermined manner according to the structure of the document data; document structuring means for structuring the document data by parsing a block into which the document data is divided by said block dividing means according to the document structure of the block, and by adding tag information to text data constituting the block, said tag information indicating an attribute of the text data; storing means for storing an extraction mode for determining whether or not a sentence is extracted with respect to a tag information added to text data; and sentence extraction means for controlling an extraction of the text data according to the tag information added to the text data by said document structuring means and said extraction mode stored in said storing means, wherein said sentence extraction means further includes tag determining means for taking an action according to the tag information, and said tag determining means determines a type of tag information for the text data tagged with tag information according to the tag information and tag action data including action data indicating an action associated with the tag information, and takes the action set for the tag action data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 50)
-
-
47. A computer-implemented document processing method comprising the steps of:
-
dividing input document data into blocks in a predetermined manner according to a structure of the document data; structuring the document data, thereby generating structured data, by parsing a block into which the document data is divided in said block dividing step according to a document structure of the block, and by adding tag information to text data constituting the block, said tag information indicating an attribute of the text data; controlling an extraction of the text data according to the added tag information and a predetermined condition, wherein the predetermined condition provides an indication of a method to be utilized to perform the extraction of the text data; determining a type of tag information for the text data tagged with tag information and tag action data including action data indicating an action associated with the tag information; and taking the action associated with the tag information.
-
-
48. A document processing program for causing a computer apparatus to execute a document processing method comprising the steps of:
-
dividing input document data into blocks in a predetermined manner according to a structure of the document data; structuring the document data, thereby generating structured data, by parsing a block into which the document data is divided in said block dividing step according to a document structure of the block, and by adding tag information to text data constituting the block, said tag information indicating an attribute of the text data; controlling an extraction of the text data according to the added tag information and a predetermined condition, wherein the predetermined condition provides an indication of a method to be utilized to perform the extraction of the text data; determining a type of tag information for the text data tagged with tag information and tag action data including action data indicating an action associated with the tag information; and taking the action associated with the tag information.
-
-
49. a non-transitory computer-readable recording storage medium having a document processing program recorded thereon, the document processing program, when executed, causing a computer apparatus to execute a document processing method comprising the steps of:
-
dividing input document data into blocks in a predetermined manner according to a structure of the document data; structuring the document data, thereby generating structured data, by parsing a block into which the document data is divided in said block dividing step according to a document structure of the block, and by adding tag information to text data constituting the block, said tag information indicating an attribute of the text data; controlling an extraction of the text data according to the added tag information and a predetermined condition, wherein the predetermined condition provides an indication of a method to be utilized to perform the extraction of the text data. determining a type of tag information for the text data tagged with tag information and tag action data including action data indicating an action associated with the tag information; and taking the action associated with the tag information.
-
Specification