Calculating image similarity using extracted data
First Claim
1. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a memory for storing the input document image;
a segmentation unit constructed to segment the input document image into text areas and image areas;
a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses a part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit;
an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit;
an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second, and third similarity calculation units;
a calculation unit constructed to calculate an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file; and
a display unit constructed to display a second plurality of image files acquired based upon the calculated overall degrees of similarity, and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.
1 Assignment
0 Petitions
Accused Products
Abstract
Retrieval accuracy is improved by causing a difference in layout from document to document to be reflected in retrieval at will. To achieve this, there is provided an information processing method for a case having a plurality of retrieval steps (S1209, S1211, S1212) of retrieving image data that is similar to an input document image, the method including a step (S1200) of inputting weighting information for weighting a degree of similarity calculated by each of the retrieval steps; a step of weighting the degree of similarity, which has been calculated by each of the retrieval steps, for every item of image data on the basis of the weighting information, and calculating overall degree of similarity; and a step (S1213) of displaying the similar image data based upon the overall degree of similarity calculated.
34 Citations
23 Claims
-
1. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:
-
a memory for storing the input document image; a segmentation unit constructed to segment the input document image into text areas and image areas; a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses a part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit; an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit; an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second, and third similarity calculation units; a calculation unit constructed to calculate an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file; and a display unit constructed to display a second plurality of image files acquired based upon the calculated overall degrees of similarity, and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
-
a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas; a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step; an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step; an acquisition step of acquiring, for each image file, the first, second and third degrees of similarity calculated in said first, second and third similarity calculation steps; a calculation step of calculating an overall degree of similarity for every image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired in said acquisition step for each image file; and a display step of displaying a second plurality of image files acquired based upon the calculated overall degrees of similarity, and of displaying information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. An information processing apparatus for retrieving an image file similar to an input document image from a plurality of image files, comprising:
-
a memory for storing the input document image; a segmentation unit constructed to segment the input document image into text areas and image areas; a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit; an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit; an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second and third similarity calculation units; a calculation unit constructed to calculate an overall degree of similarity for every image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file; and a display unit constructed to sort and display the plurality of image files, the overall degrees of similarity corresponding to image files, and information which represents the type of similarity calculation used for calculating each overall degree of similarity, in order of decreasing calculated overall degree of similarity.
-
-
19. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:
-
a memory for storing the input document image; a segmentation unit constructed to segment the input document image into text areas and image areas; a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit; an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit; a calculation-area designating unit constructed to designate, based on a command from a user, an area to be used in the calculation of the degree of similarity from among the areas obtained by segmentation by said segmentation unit; an acquisition unit constructed to acquire, for each image file, the degree of similarity calculated by each similarity calculation unit which calculates the degree of similarity for the area designated by said calculation-area designating unit from among said first, second and third similarity calculation units; a calculation unit constructed to calculate an overall degrees of similarity for each image file by weighting, on the basis of the first, second and third priority information, each degree of similarity which has been acquired by said acquisition unit; and a display unit constructed to display a second plurality of image files acquired based on the calculated overall degree of similarity and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.
-
-
20. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:
-
a memory for storing the input document image; a segmentation unit constructed to segment the input document image into text areas and image areas; a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit; a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit; an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit; an area designating unit constructed to designate, based on a command from a user, an area to be emphasized by the user from among the areas obtained by segmentation by said segmentation unit; an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second and third similarity calculation units; an calculation unit constructed to calculate an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file, and constructed to increase weighting of each degree of similarity for the area designated by said area designating unit; and a display unit constructed to display a second plurality of image files acquired based on the calculated overall degrees of similarity and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.
-
-
21. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
-
a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas; a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step; an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step; a acquisition step of acquiring, for each image file, the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps; a calculation step of calculating an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired in said acquisition step for each image file; and a display step of sorting and displaying the plurality of image files, the overall degrees of similarity corresponding to the image files, and information which represents the type of similarity calculation used for calculating each overall degree of similarity, in order of decreasing calculated overall degree of similarity.
-
-
22. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
-
a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas; a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step; an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step; a calculation-area designating step of designating, based on a command from a user, an area to be used in the calculation of the degree of similarity from among the areas obtained by segmentation in said segmentation step; a acquisition step of acquiring, for each image file, the degree of similarity calculated in each similarity calculation step which calculates the degree of similarity for the area designated in said calculation-area designating step from among said first, second and third similarity calculation steps; a calculation step of calculating an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each degree of similarity which has been acquired in said acquisition step for each image file; and a display step of displaying a second plurality of image files acquired based on the calculated overall degrees of similarity and of displaying information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.
-
-
23. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
-
a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas; a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step; a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step; an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step; an area designating step of designating, based upon a command from a user, an area to be emphasized by the user from among the areas obtained by segmentation in said segmentation step; a acquisition step of acquiring, for each image file, the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps; a calculation step of calculating overall degrees of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which has been acquired in said acquisition step for each image file and of increasing the weighting of each degree of similarity for the area designated in said area designating step; and a display step of displaying a second plurality of retrieved image files acquired based on the calculated overall degree of similarity and of displaying information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.
-
Specification