Methods for efficient cluster analysis
First Claim
1. A computer readable medium storing a computer program which when executed by at least one processor defines structure for a document comprising a plurality of primitive elements that are defined in terms of their positions in the document, the computer program comprising sets of instructions for:
- sorting a particular set of primitive elements in a first order based on their positions in the document;
calculating relative difference values between adjacent pairs of primitive elements in the first order based on their positions;
sorting the relative difference values of the pairs of primitive elements into a second order from closest to furthest pairs;
storing a single value from the second order that identifies which of the pairs of primitive elements are sufficiently far apart to form partitions between subsets of primitive elements; and
using the stored single value to identify and analyze the partitions in order to define structural elements for the document.
1 Assignment
0 Petitions
Accused Products
Abstract
Some embodiments provide a method for defining structure for an unstructured document that includes a number of primitive elements that are defined in terms of their position in the document. The method identifies a pairwise grouping of nearest primitive elements. The method sorts the pairwise primitive elements based on an order from the closest to the furthest pairs. The method stores a single value that identifies which of the pairwise primitive elements are sufficiently far apart to form a partition. The method uses the stored value to identify and analyze the partitions in order to define structural elements for the document.
-
Citations
36 Claims
-
1. A computer readable medium storing a computer program which when executed by at least one processor defines structure for a document comprising a plurality of primitive elements that are defined in terms of their positions in the document, the computer program comprising sets of instructions for:
-
sorting a particular set of primitive elements in a first order based on their positions in the document; calculating relative difference values between adjacent pairs of primitive elements in the first order based on their positions; sorting the relative difference values of the pairs of primitive elements into a second order from closest to furthest pairs; storing a single value from the second order that identifies which of the pairs of primitive elements are sufficiently far apart to form partitions between subsets of primitive elements; and using the stored single value to identify and analyze the partitions in order to define structural elements for the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer readable medium storing a computer program which when executed by at least one processor defines structure for a document comprising a plurality of primitive elements that are defined in terms of attribute values, the computer program comprising sets of instructions for:
-
defining an indirectly sorted array by calculating difference values between adjacent pairs of sorted primitive elements based on their attribute values and storing indices of the calculated difference values in a sorted array as values of the indirectly sorted array; using the indirectly sorted array to generate a plurality of different partition sets at different distance scales for the plurality of primitive elements, wherein each different partition set divides the plurality of primitive elements into different subsets of primitive elements; from the plurality of partition sets, selecting an optimal partition set based on a set of optimization measures; and grouping the plurality of primitive elements using the optimal partition set in order to associate at least one subset of the primitive elements as a structured element in the document. - View Dependent Claims (22, 23, 24, 25)
-
-
26. A method for defining structure for a document comprising a plurality of primitive elements that are defined in terms of their positions in the document, the method comprising:
-
sorting a particular set of primitive elements in a first order based on their positions in the document; calculating relative difference values between adjacent pairs of primitive elements in the first order based on their positions; sorting the relative difference values of the pairs of primitive elements into a second order from closest to furthest pairs; storing a single value from the second order that identifies which of the pairs of primitive elements are sufficiently far apart to form partitions between subsets of primitive elements; and using the stored single value to identify and analyze the partitions in order to define structural elements for the document. - View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. A method for defining structure for a document comprising a plurality of primitive elements that are defined in terms of attribute values, the method comprising:
-
defining an indirectly sorted array by calculating difference values between adjacent pairs of sorted primitive elements based on their attribute values and storing indices of the calculated difference values in a sorted array as values of the indirectly sorted array; using the indirectly sorted array to generate a plurality of different partition sets at different distance scales for the plurality of primitive elements, wherein each different partition set divides the plurality of primitive elements into different subsets of primitive elements; from the plurality of partition sets, selecting an optimal partition set based on a set of optimization measures; and grouping the plurality of primitive elements using the optimal partition set in order to associate at least one subset of the primitive elements as a structured element in the document. - View Dependent Claims (33, 34, 35, 36)
-
Specification