Apparatus, method, and computer program product for extracting structured document
First Claim
1. An apparatus for retrieving a structured document comprising:
- a processor that is programmed to retrieve the structured document, wherein the processor causes;
a first specifying unit to specify a plurality of object documents from a plurality of structured documents being accessible via a network, the object document being the structured document according to retrieval condition,a first extracting unit to extract a text included in the object document,a second extracting unit to extract a metadata appended to the object document, the metadata being first data indicating the text of the object document and second data indicating a link relation between the object document and related documents, each of the related documents being the structured document associated with the object document,a second specifying unit to specify whether a description supporting the object document is included in the text of the each related document,an analyzing unit to analyze whether author information of a related document is included in a metadata appended to the related document based on the metadata appended to the object document, anda first calculating unit to calculate higher importance for the object document related to the related document having the author information thereof included in the metadata compared with important of the object document related to the related document not having the author information thereof included in the metadata; and
further to calculate higher importance for the object document corresponding to the related document including the description supporting the object document compared with importance of the object document corresponding to the related document not including a description supporting the object document.
4 Assignments
0 Petitions
Accused Products
Abstract
An apparatus for retrieving a structured document including a first specifying unit that specifies a plurality of object documents from a plurality of structured documents being accessible via a network, the object document being the structured document according to retrieval condition; a first extracting unit that extracts text included in the object document; a second extracting unit that extracts metadata appended to the object document, the metadata being first data concerning the text of the object document and second data indicating a link relation between the object document and the structured documents; and a first calculating unit that calculates importance of each of the object documents, based on the text and the metadata of each of the object documents.
-
Citations
19 Claims
-
1. An apparatus for retrieving a structured document comprising:
a processor that is programmed to retrieve the structured document, wherein the processor causes; a first specifying unit to specify a plurality of object documents from a plurality of structured documents being accessible via a network, the object document being the structured document according to retrieval condition, a first extracting unit to extract a text included in the object document, a second extracting unit to extract a metadata appended to the object document, the metadata being first data indicating the text of the object document and second data indicating a link relation between the object document and related documents, each of the related documents being the structured document associated with the object document, a second specifying unit to specify whether a description supporting the object document is included in the text of the each related document, an analyzing unit to analyze whether author information of a related document is included in a metadata appended to the related document based on the metadata appended to the object document, and a first calculating unit to calculate higher importance for the object document related to the related document having the author information thereof included in the metadata compared with important of the object document related to the related document not having the author information thereof included in the metadata; and
further to calculate higher importance for the object document corresponding to the related document including the description supporting the object document compared with importance of the object document corresponding to the related document not including a description supporting the object document.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19)
-
16. A method of retrieving a structured document that is accessible via a network, the method comprising:
-
specifying a plurality of object documents from a plurality of structured documents, the object document being the structured document according to retrieval condition, extracting a text included in the object document, extracting a metadata appended to the object document, the metadata being first data indicating the text of the object document and second data indicating a link relation between the object document and related documents, each of the related documents being the structured document associated with the object document, specifying whether a description supporting the object document is included in the text of the each related document, analyzing whether author information of a related document is included in a metadata appended to the related document based on the metadata appended to the object document, and calculating higher importance for the object document related to the related document having the author information thereof included in the metadata compared with importance of the object document related to the related document not having the author information thereof included in the metadata, and further calculating higher importance for the object document corresponding to the related document including the description supporting the object document compared with importance of the object document corresponding to the related document not including a description supporting the object document.
-
-
17. A computer program product that is executable by a computer and has a computer-readable recording medium including a plurality of commands for retrieving a structured document, wherein the commands cause the computer to execute:
-
specifying a plurality of object documents from a plurality of structured documents, the object document being the structured document according to retrieval condition, extracting a text included in the object document, extracting a metadata appended to the object document, the metadata being first data indicating the text of the object document and second data indicating a link relation between the object document and related documents, each of the related documents being the structured document associated with the object document, specifying whether a description supporting the object documents is included in the text of the each related document, analyzing whether author information of a related document is include in a metadata appended to the related document based on the metadata appended to the object document, and calculating higher importance for the object document related to the related document having the author information thereof included in the metadata compared with importance of the object document related to the related document not having the author information thereof included in the metadata, and further calculating higher importance for the object document corresponding to the related document including the description supporting the object document compared with importance of the object document corresponding to the related document not including a description supporting the object document.
-
Specification