SYSTEM FOR SORTING DOCUMENT IMAGES BY SHAPE COMPARISONS AMONG CORRESPONDING LAYOUT COMPONENTS
First Claim
1. A method for sorting document images stored in a memory of a document management system, comprising the steps of:
- segmenting each document image recorded in the memory into a set of layout objects;
each layout object in each of the sets of layout objects being one of a plurality of layout object types;
each of the plurality of layout object types identifying a structural element of a document;
selecting a feature of a document from a set of features;
each of the features in the set of features identifying groups of layout objects in different ones of the sets of layout objects recorded in the memory;
assembling in the memory a set of image segments;
each image segment in the set of image segments identifies those layout objects of a document image stored in the memory that form the selected feature; and
sorting the assembled image segments into clusters in the memory;
each cluster defining a grouping of image segments that have similar layout objects forming the selected feature.
7 Assignments
0 Petitions
Accused Products
Abstract
A programming interface of document search system enables a user to dynamically specifying features of documents recorded in a corpus of documents. The programming interface provides category and format flexibility for defining different genre of documents. The document search system initially segments document images into one or more layout objects. Each layout object identifies a structural element in a document such as text blocks, graphics, or halftones. Subsequently, the document search system computes a set of attributes for each of the identified layout objects. The set of attributes are used to describe the layout structure of a page image of a document in terms of the spatial relations that layout objects have to frames of reference that are defined by other layout objects. Using the set of attributes a user defines features of a document with the programming interface. After receiving a feature or attribute and a set of document images selected by a user, the system forms a set of image segments by identifying those layout objects in the set of document images that make up the selected feature or attribute. The system then sorts the set of image segments into meaningful groupings of objects which have similarities and/or recurring patterns. In operation, the system sorts images in the image domain based on segments (or portions) of a document image which have been automatically extracted by the system. As a result, searching becomes more efficient because it is performed on limited portions of a document. Subsequently, document images in the set of document images are order and displayed to a user in accordance with the meaningful groupings.
177 Citations
20 Claims
-
1. A method for sorting document images stored in a memory of a document management system, comprising the steps of:
-
segmenting each document image recorded in the memory into a set of layout objects;
each layout object in each of the sets of layout objects being one of a plurality of layout object types;
each of the plurality of layout object types identifying a structural element of a document;
selecting a feature of a document from a set of features;
each of the features in the set of features identifying groups of layout objects in different ones of the sets of layout objects recorded in the memory;
assembling in the memory a set of image segments;
each image segment in the set of image segments identifies those layout objects of a document image stored in the memory that form the selected feature; and
sorting the assembled image segments into clusters in the memory;
each cluster defining a grouping of image segments that have similar layout objects forming the selected feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 17, 18, 19, 20)
-
-
13. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for sorting document images stored in a memory of a document management system, said method steps comprising:
-
segmenting each document image recorded in the memory into a set of layout objects;
each layout object in each of the sets of layout objects being one of a plurality of layout object types;
each of the plurality of layout object types identifying a structural element of a document;
selecting a feature of a document from a set of features;
each of the features in the set of features identifying groups of layout objects in different ones of the sets of layout objects recorded in the memory;
assembling in the memory a set of image segments;
each image segment in the set of image segments identifies those layout objects of a document image stored in the memory that form the selected feature; and
sorting the assembled image segments into clusters in the memory;
each cluster defining a grouping of image segments that have similar layout objects forming the selected feature.
-
-
16. A document management system for sorting document images, comprising:
-
a memory for storing the document images and image processing instructions of the document management system; and
a processor coupled to the memory for executing the image processing instructions of the document management system;
the processor in executing the image processing instructions;
segmenting each document image recorded in the memory into a set of layout objects;
each layout object in each of the sets of layout objects being one of a plurality of layout object types;
each of the plurality of layout object types identifying a structural element of a document;
selecting a feature of a document from a set of features;
each of the features in the set of features identifying groups of layout objects in different ones of the sets of layout objects recorded in the memory;
assembling in the memory a set of image segments;
each image segment in the set of image segments identifies those layout objects of a document image stored in the memory that form the selected feature; and
sorting the assembled image segments into clusters in the memory;
each cluster defining a grouping of image segments that have similar layout objects forming the selected feature.
-
Specification