Method and means of matching documents based on text genre
First Claim
1. A computer implemented method of developing text genre from a collection of documents, the method comprising the steps of:
- (a) extracting at least one key string from one document;
(b) extracting at least one key string from another document;
(c) forming a sequence of matching strings therefrom which preserve reading order;
(d) using a confusion class for each character of each extracted string;
(e) finding the longest common subsequence of matching strings to form an initial estimate of text genre; and
(f) repeating steps (b) to (e) until a definition of the text genre is developed that captures the spatial structure of key strings as an LCS (longest common sequence) of matching key string sequences.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for matching documents based on spatial layout of regions based on a shape similarity model for detecting similarity between general 2D objects. The method uses the shape similarity model to determine if two documents are similar by logical region generation in which logical regions are automatically derived from information in the documents to be matched, region correspondence, in which a correspondence is established between the regions on the documents, pose computation in which the individual transforms relating corresponding regions are recovered, and pose verification in which the extent of spatial similarity is measured by projecting one document onto the other using the computed pose parameters.
-
Citations
1 Claim
-
1. A computer implemented method of developing text genre from a collection of documents, the method comprising the steps of:
-
(a) extracting at least one key string from one document;
(b) extracting at least one key string from another document;
(c) forming a sequence of matching strings therefrom which preserve reading order;
(d) using a confusion class for each character of each extracted string;
(e) finding the longest common subsequence of matching strings to form an initial estimate of text genre; and
(f) repeating steps (b) to (e) until a definition of the text genre is developed that captures the spatial structure of key strings as an LCS (longest common sequence) of matching key string sequences.
-
Specification