Information processing apparatus, program, and recording medium
First Claim
1. ) An information processing apparatus that classifies a plurality of document components contained in document information, into a plurality of groups, the apparatus comprising:
- a component converting section that converts each of said plurality of document components in said document information into element identifying information indicating the type or role of the document component;
an intra-document pattern of sequence converting section that processes said document information converted by said component converting section to convert each of said sets of pieces of element identifying information that appear repeatedly at a predetermined threshold frequency or higher, into said element identifying information indicating a pattern of sequence of the set of the element identifying information; and
a group classifying section that processes document information obtained by allowing said intra-document pattern of sequence converting section to convert said document information repeatedly, to group a plurality of said document components converted into a corresponding piece of element identifing information by said intra-document pattern of sequence converting section.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing apparatus, method, and program product for converting each of the plurality of document components in a document into element identifying information indicating the type or role of the document component. Also included is an intra-document pattern sequence converting section that processes the document information converted by the component converting section to convert each of the sets of pieces of element identifying information that appear repeatedly at a predetermined threshold frequency or higher, into the element identifying information. This information indicates a pattern of the sequence of the set of the element identifying information. Also included is a group classifying section that processes document information obtained by repeated conversions by the intra-document pattern of sequence converting section to group a plurality of the document components converted into a corresponding piece of element identifying information by the intra-document pattern of sequence converting section.
-
Citations
30 Claims
-
1. ) An information processing apparatus that classifies a plurality of document components contained in document information, into a plurality of groups, the apparatus comprising:
-
a component converting section that converts each of said plurality of document components in said document information into element identifying information indicating the type or role of the document component;
an intra-document pattern of sequence converting section that processes said document information converted by said component converting section to convert each of said sets of pieces of element identifying information that appear repeatedly at a predetermined threshold frequency or higher, into said element identifying information indicating a pattern of sequence of the set of the element identifying information; and
a group classifying section that processes document information obtained by allowing said intra-document pattern of sequence converting section to convert said document information repeatedly, to group a plurality of said document components converted into a corresponding piece of element identifing information by said intra-document pattern of sequence converting section. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. ) An information processing apparatus that classifies a plurality of document components contained in document information, into a plurality of groups, the apparatus comprising:
-
a related document detecting section that detects related document information having a predetermined relationship with target document information to be grouped;
a component converting section that converts, for each of said target document information and said related document information, each of said plurality of document components into element identifying information indicating the type or role of the document component;
an inter-document pattern of sequence converting section that converts each of those sets of said element identifying information which appear in both said target document information and said related document information and which appear repeatedly at a predetermined threshold frequency or higher in the document containing a combination of said target document information and said related document information, into element identifying information indicating the pattern of sequence of the set of the element identifying information; and
a group classifying section that processes said target document information obtained by the conversion by said intra-document pattern of sequence converting section to group a plurality of said document components converted into a corresponding piece of element identifying information by said intra-document pattern of sequence converting section. - View Dependent Claims (26, 27)
-
-
28. ) A program for controlling an information processing apparatus that classifies a plurality of document components contained in document information, into a plurality of groups, the program allowing said information processing apparatus to function as:
-
a component converting section that converts each of said plurality of document components from said document information into element identifying information indicating the type or role of the document component;
an intra-document pattern of sequence converting section that processes said document information converted by said component converting section to convert each of said sets of pieces of element identifying information that appear repeatedly at a predetermined threshold frequency or higher, into said element identifying information indicating a pattern of sequence of the set of the element identifying information; and
a group classifying section that processes document information obtained by repeated conversions by said intra-document pattern of sequence converting section to group a plurality of said document components converted into a corresponding piece of element identifying information by said intra-document pattern of sequence converting section.
-
-
29. ) A program for controlling an information processing apparatus that classifies a plurality of document components contained in document information, into a plurality of groups, the program allowing said information processing apparatus to function as*
a related document detecting section that detects related document information having a predetermined relationship with target document information to be grouped; -
a component converting section that converts, for each of said target document information and said related document information, each of said plurality of document components into element identifying information indicating the type or role of the document component;
an inter-document pattern of sequence converting section that converts each of those sets of said element identifying information which appear in both said target document information and said related document information and which appear repeatedly at a predetermined threshold frequency or higher in a document having a combination of said target document information and said related document information, into element identifying information indicating the pattern of sequence of the set of the element identifying information; and
a group classifying section that processes said target document information obtained by the conversion by said intra-document pattern of sequence converting section to group a plurality of said document components converted into a corresponding piece of element identifying information by said intra-document pattern of sequence converting section.
-
-
30. ) A program product comprising a recording medium comprising program code for controlling an information processing apparatus that classifies a plurality of document components contained in document information, into a plurality of groups, the program allowing said information processing apparatus to function as:
-
a component converting section that converts each of said plurality of document components from said document information into element identifying information indicating the type or role of the document component;
an intra-document pattern of sequence converting section that processes said document information converted by said component converting section to convert each of said sets of pieces of element identifying information that appear repeatedly at a predetermined threshold frequency or higher, into said element identifying information indicating a pattern of sequence of the set of the element identifying information; and
a group classifying section that processes document information obtained by repeated conversions by said intra-document pattern of sequence converting section to group a plurality of said document components converted into a corresponding piece of element identifying information by said intra-document pattern of sequence converting section.
-
Specification