DOCUMENT PROCESSING DEVICE AND DOCUMENT PROCESSING METHOD
First Claim
1. A document processing apparatus comprising:
- a node-pair detection unit operative to detect from a structured file described using a predetermined tag set a tag pair having a predetermined positional relation as a node pair;
an attribute-value acquisition unit operative to index as an attribute value according to a predetermined rule an appearance mode of a node pair in a structured document file;
an index creation unit operative to create index information associating a node pair and an attribute value thereof;
a common-pair detection unit operative to detect as a common pair a node pair that is common in a node pair group detected from a first structured document file and a node pair group detected from a second structured document file; and
a node-similarity-value calculation unit operative to index as a node similarity value, by referring to the index information of the first structured document file and the index information of the second structured document file, the similarity between the attribute value of the common pair in the first structured document file and the attribute value of the common pair in the second structured document file.
1 Assignment
0 Petitions
Accused Products
Abstract
A structured document file in similarity relation is specified based on a tag structure of a structured document file.
A node-pair detection unit detects from a structured file a tag pair having a predetermined positional relation as a node pair. An attribute-value acquisition unit indexes as an attribute value the appearance mode of a node pair in a structured document file. An index-information creation unit creates index information associating a node pair and an attribute value thereof. A common-pair detection unit detects as a common pair a node pair that is common in a query document, which is a structured document file, and in a document to be examined, which is a structured document file to be compared. A node-similarity-value calculation unit indexes as a node similarity value, by referring to the index information of the query document and the index information of the document to be examined, the similarity between the attribute value of the common pair in the query document and the attribute value of the common pair in the document to be examined.
-
Citations
10 Claims
-
1. A document processing apparatus comprising:
-
a node-pair detection unit operative to detect from a structured file described using a predetermined tag set a tag pair having a predetermined positional relation as a node pair; an attribute-value acquisition unit operative to index as an attribute value according to a predetermined rule an appearance mode of a node pair in a structured document file; an index creation unit operative to create index information associating a node pair and an attribute value thereof; a common-pair detection unit operative to detect as a common pair a node pair that is common in a node pair group detected from a first structured document file and a node pair group detected from a second structured document file; and a node-similarity-value calculation unit operative to index as a node similarity value, by referring to the index information of the first structured document file and the index information of the second structured document file, the similarity between the attribute value of the common pair in the first structured document file and the attribute value of the common pair in the second structured document file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A document processing method comprising:
-
detecting in a structured file described using a predetermined tag set a tag pair having a predetermined positional relation as a node pair; indexing as an attribute value according to a predetermined rule an appearance mode of a node pair in a structured document file; creating index information associating a node pair and an attribute value thereof; detecting as a common pair a node pair that is common in a node pair group detected from a first structured document file and a node pair group detected from a second structured document file; and indexing as a node similarity value, by referring to the index information of the first structured document file and the index information of the second structured document file, the similarity between the attribute value of the common pair in the first structured document file and the attribute value of the common pair in the second structured document file.
-
-
10. A document processing computer program product comprising:
-
a module that detects from a structured file described using a predetermined tag set a tag pair having a predetermined positional relation as a node pair; a module that indexes as an attribute value according to a predetermined rule an appearance mode of a node pair in a structured document file; a module that creates index information associating a node pair and an attribute value thereof; a module that detects as a common pair a node pair that is common in a node pair group detected from a first structured document file and a node pair group detected from a second structured document file; and a module that indexes as a node similarity value, by referring to the index information of the first structured document file and the index information of the second structured document file, the similarity between the attribute value of the common pair in the first structured document file and the attribute value of the common pair in the second structured document file.
-
Specification