Navigation system for document image database
First Claim
Patent Images
1. A method for searching for a particular document image in a database containing a plurality of document images, each document image having a textual component, a compressed representation, and a non-compressed representation, said method comprising:
- accepting text from a user as a keyword to search;
searching the textual component of said plurality of document images for said keyword;
detecting similarity of features extracted from document images having textual components that contain said keyword, wherein said detecting is based on processing of the compressed representation or the non-compressed representation of the document images;
grouping document images having similar extracted features into a plurality of clusters of document images;
selecting a representative document image for each of the plurality of clusters of document images;
displaying the representative document image for each of the plurality of clusters of document images; and
accepting input from the user indicating a particular cluster of document images.
3 Assignments
0 Petitions
Accused Products
Abstract
An interactive database organization and searching system employs text search and image feature extraction to automatically group documents together by appearance. The system automatically determines visual characteristics of document images and collect documents together according to the relative similarity of their document images.
-
Citations
32 Claims
-
1. A method for searching for a particular document image in a database containing a plurality of document images, each document image having a textual component, a compressed representation, and a non-compressed representation, said method comprising:
-
accepting text from a user as a keyword to search; searching the textual component of said plurality of document images for said keyword; detecting similarity of features extracted from document images having textual components that contain said keyword, wherein said detecting is based on processing of the compressed representation or the non-compressed representation of the document images; grouping document images having similar extracted features into a plurality of clusters of document images; selecting a representative document image for each of the plurality of clusters of document images; displaying the representative document image for each of the plurality of clusters of document images; and accepting input from the user indicating a particular cluster of document images. - View Dependent Claims (2, 3, 4, 5, 6, 8, 9, 10, 11)
-
-
7. A method for searching for a particular document image in a database containing a plurality of document images, wherein each document image has a textual component a compressed representation and a non-compressed representation, said method comprising:
-
accepting text from a user as a keyword to search; searching the textual component of said plurality of document images for said keyword; grouping document images having textual components that contain said keyword into a plurality of clusters of document images based upon processing of the compressed representation or the non-compressed representation of the document images, wherein said processing includes extracting image feature information about said particular document image, and applying a "k-means" clustering algorithm to said image feature information to form said plurality of clusters of document images; displaying, based on said processing, a representative document image for each of the plurality of clusters of document images; and accepting input from the user indicating a particular cluster of document images, wherein said plurality of clusters consists of between 5 and 10 document images.
-
-
12. A method for organizing a plurality of document images in a database comprising:
-
compressing each particular document image in said plurality of document images; extracting feature information about said particular document image; detecting similarity of features extracted from said plurality of document images, wherein said detecting is based on processing of a compressed representation or a non-compressed representation of the document images; grouping said plurality of document images together to form clusters of document images; selecting a representative document image for each of the clusters of document images; and displaying each representative document image. - View Dependent Claims (13, 14, 15, 16, 18, 19, 20, 21)
-
-
17. A method for organizing a plurality of document images in a database comprising:
-
compressing each particular document image in said plurality of document images; extracting image feature information about said particular document image; grouping said plurality of document images together to form clusters of document images, wherein said grouping comprises applying a "k-means" clustering algorithm; selecting a representative document image for each of the clusters of document images; and displaying each representative document image, wherein said clusters include between 5 and 10 document images.
-
-
22. A computer program product comprising:
-
code that accepts text from a user as a keyword to search; code that searches a database of document images, each document image having a textual component and a compressed representation, for document images with textual components that contain said keyword; code that detects similarity of features extracted from documents having textual components that contain said keyword; code that groups document images having similar extracted features together into clusters of document images; code that selects a representative document image for each of the clusters for display; code that displays selected representative document images; and a computer readable storage medium configured to store the codes. - View Dependent Claims (23, 24, 25, 27, 28, 29, 30, 31)
-
-
26. A computer program product comprising:
-
code that accepts text from a user as a keyword to search; code that searches a database of document images, each document image having a textual component and a compressed representation, for document images with textual components that contain said keyword; code that groups document images together into clusters of document images based upon processing of the image component of said document images, wherein said clusters include between 5 and 10 document images; code that selects a representative document image for each of the clusters for display; code that displays selected representative document images; and a computer readable storage medium configured to store the codes.
-
-
32. A document image database organizing system comprising:
-
an electronic storage unit that stores a document image database; a display that displays document images; a processor unit coupled to said electronic storage device and said display, said processor unit operative to; compress document images; extract feature information about document images; detect, based on a compressed representation or a non-compressed representation of said document images similarity of features extracted from said document images; group said document images into a plurality of clusters of document images according to said detected similarity of features; select a representative document image for each cluster; display said representative document image; and accept input commands to manipulate document images.
-
Specification