Information processing apparatus, program, and recording medium
First Claim
1. An information processing apparatus stored on a recording medium that classifies a plurality of document components into a plurality of groups, said information processing apparatus comprising:
- a conversion instruction input section for receiving the conversion instructions;
a related document detecting section for related document information having a predetermined relationship with target document information;
a component converting section for converting said target document information and said related document information into element identifying information indicating a document component type;
an inter-document pattern converting section for identifying patterns in said element identifying information which that(a) appear in both said target document information and said related document information; and
(b) appear at a first threshold frequency in said element identifying information for indicating;
an intra-document pattern of sequence converting section that selects sets of element identifying information that appear repeatedly at a second threshold frequency or higher as candidates for conversion;
a group classifying section for grouping said identified patterns wherein each group comprises patterns that are identical to each other;
a displaying section for displaying said converted target document information;
an annotation output section for labeling each of the groups of said identified patterns; and
a document information identity output section for presenting said labeled groups.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing apparatus for classifying documents. The apparatus begins with an input document, and detects related documents. Next, the apparatus converts the document components into element identifying information that indicates the type or role of the document component. In the next stage, an internal document sequence converting section converts each set of repetitive element identifying information into element identifying information indicating the sequence of the element identifying information. Next, an inter-document sequence converting section identifies sets of element identifying information that appear in both a target document and a related document. An inter-document sequence converting section detects the identified sets of element identifying information which appear repeatedly in the target document. Then, a sequence converting identifies information into element identifying information, indicating the sequence of the elements, and structures the target document.
-
Citations
20 Claims
-
1. An information processing apparatus stored on a recording medium that classifies a plurality of document components into a plurality of groups, said information processing apparatus comprising:
-
a conversion instruction input section for receiving the conversion instructions; a related document detecting section for related document information having a predetermined relationship with target document information; a component converting section for converting said target document information and said related document information into element identifying information indicating a document component type; an inter-document pattern converting section for identifying patterns in said element identifying information which that (a) appear in both said target document information and said related document information; and (b) appear at a first threshold frequency in said element identifying information for indicating; an intra-document pattern of sequence converting section that selects sets of element identifying information that appear repeatedly at a second threshold frequency or higher as candidates for conversion; a group classifying section for grouping said identified patterns wherein each group comprises patterns that are identical to each other; a displaying section for displaying said converted target document information; an annotation output section for labeling each of the groups of said identified patterns; and a document information identity output section for presenting said labeled groups. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An information processing apparatus stored on a recording medium, comprising:
-
a component converting section that; acquires sets of registered elements indicating a combination of document components to be converted; and converts document components into element identifying information indicating the type or role of the document component; a component selecting section that selects element identifying information as selected information, in order of ascending frequency of occurrence; an interstitial component detecting section that; detects (a) an interstitial component arranged between the selected information and information in the document information that is next to the selected information and which is of the same type as that of the selected information; and
(b) a terminal component following a plurality of selected information; andtransmits the interstitial component to an intra-document pattern of sequence converting section; wherein the intra-document pattern of sequence converting section; generates element identifying information indicating a pattern of sequence of sets of element identifying information that are candidates for conversion; converts the sets of element identifying information into element identifying information; and detects selected information which is arranged at an end of the document information; a repetition end determining section that transmits document information to the component selecting section in order to cause the intra-document pattern of sequence converting section to further convert the document information already converted by the intra-document pattern of sequence converting section; an inter-document pattern of sequence converting section that; sequentially receives target document information and related document information from the repetition end determining section; identifies sets of element identifying information that appear in both the target document information and the related document information; detects identified sets of element identifying information which appear repeatedly at a predetermined threshold frequency or higher in the document containing a combination of the target document information and the related document information; and converts each set of element identifying information that appears repeatedly at the threshold frequency or higher into element identifying information indicating a pattern of sequence of element identifying information; and a group classifying section that receives from the inter-document pattern of sequence of converting section target document information and related document information that has been converted. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. An information processing apparatus stored on a recording medium, comprising:
-
a related document detecting section that detects related document information that has a predetermined relationship with target document information and that is document information stored in a predetermined range from a storage position where the target document information is stored; a component converting section that; determines an existence of image identifying information, text data, or link information for conversion into element type information; and
determines whether a document component conforms to a predetermined rule; andconverts document components into element identifying information indicating contents of the rule; a group classifying section that; groups converted document components into corresponding element identifying information; generates title information indicating (a) a role played in the document information by the document components in a group or contents of the document components in the group; and generates importance information indicating the importance of the group; a document structure information generating section that generates document structure information indicating the structure of the document information; an interstitial component detecting section that detects an interstitial component arranged between selected information selected by the component selecting section and information which is arranged in the document information next to the selected information and which is of the same type as that of the selected information, whereby the apparatus can determine whether a set of document components is arranged so as to be displayed to a user in a table form or utilizes the table form to improve the layout of the document information, and can identify the roles of sets of document components. - View Dependent Claims (19, 20)
-
Specification