Document data processing device
First Claim
1. A document data processing device that manages documents using metadata within the documents, the document data processing device comprising:
- a memory which stores document data to be processed; and
a processor which acquires the document data to be processed, from the memory, for which a type of metadata included in the documents is specified; and
an output device which outputs a first determination result by the processor,wherein the processor determines whether or not a layout feature that metadata to be processed within the document data to be processed has is effective in extracting the metadata to be processed, to generate the first determination result, by checking whether or not the layout feature that the metadata to be processed has is manifested in a text string other than the metadata in the document data to be processed, and, based on a result of the check, determines whether or not the layout feature is effective in extracting the metadata.
1 Assignment
0 Petitions
Accused Products
Abstract
There is provided a technique for automatically acquiring metadata with respect to various organizations which significantly reduces the man-hours required to prepare models for metadata extraction. With a pair comprising a document and metadata appearing therein as input, using a layout feature, and proximate text string and partial text string features with respect to metadata and a text string that is not metadata, the use of the layout feature, the proximate text string and the partial text string with respect to the automatic acquisition of metadata is automatically configured (see FIG. 1).
-
Citations
10 Claims
-
1. A document data processing device that manages documents using metadata within the documents, the document data processing device comprising:
-
a memory which stores document data to be processed; and a processor which acquires the document data to be processed, from the memory, for which a type of metadata included in the documents is specified; and an output device which outputs a first determination result by the processor, wherein the processor determines whether or not a layout feature that metadata to be processed within the document data to be processed has is effective in extracting the metadata to be processed, to generate the first determination result, by checking whether or not the layout feature that the metadata to be processed has is manifested in a text string other than the metadata in the document data to be processed, and, based on a result of the check, determines whether or not the layout feature is effective in extracting the metadata. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A document data processing device that manages documents using metadata within the documents, the document data processing device comprising:
-
a memory which stores document data to be processed; and a processor which acquire the document data to be processed, from the memory, for which a type of metadata included in the documents is specified; and an output device which outputs a first determination result by the processor, wherein the processor determines whether or not at least two features from among a layout feature that metadata to be processed within the document data to be processed has, a proximate text string feature that is in proximity to the metadata to be processed, and a partial text string feature included in the metadata to be processed are effective in extracting the metadata to be processed, to generate the first determination result, by checking whether or not the layout feature that the metadata to be processed has is manifested in a text string other than the metadata in the document data to be processed, and, based on a result of the check, determines whether or not the layout feature is effective in extracting the metadata. - View Dependent Claims (8, 9, 10)
-
Specification