Calculating image similarity using extracted data

US 7,548,916 B2
Filed: 04/27/2004
Issued: 06/16/2009
Est. Priority Date: 04/30/2003
Status: Expired due to Fees

First Claim

Patent Images

1. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:

a memory for storing the input document image;

a segmentation unit constructed to segment the input document image into text areas and image areas;

a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;

a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses a part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;

a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit;

an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit;

an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second, and third similarity calculation units;

a calculation unit constructed to calculate an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file; and

a display unit constructed to display a second plurality of image files acquired based upon the calculated overall degrees of similarity, and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Retrieval accuracy is improved by causing a difference in layout from document to document to be reflected in retrieval at will. To achieve this, there is provided an information processing method for a case having a plurality of retrieval steps (S1209, S1211, S1212) of retrieving image data that is similar to an input document image, the method including a step (S1200) of inputting weighting information for weighting a degree of similarity calculated by each of the retrieval steps; a step of weighting the degree of similarity, which has been calculated by each of the retrieval steps, for every item of image data on the basis of the weighting information, and calculating overall degree of similarity; and a step (S1213) of displaying the similar image data based upon the overall degree of similarity calculated.

34 Citations

View as Search Results

23 Claims

1. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a memory for storing the input document image;
  
  a segmentation unit constructed to segment the input document image into text areas and image areas;
  
  a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses a part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit;
  
  an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit;
  
  an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second, and third similarity calculation units;
  
  a calculation unit constructed to calculate an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file; and
  
  a display unit constructed to display a second plurality of image files acquired based upon the calculated overall degrees of similarity, and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The apparatus according to claim 1, wherein said display unit sorts and displays, in order of decreasing overall degree of similarity, information relating to the overall degree of similarity calculated for every displayed image file and the information which represents the type of similarity calculation used for calculating the overall degree of similarity.
  - 3. The apparatus according to claim 1, wherein said display unit displays the overall degree of similarity calculated for every image file in the form of a graph.
  - 4. The apparatus according to claim 1, further comprising:
    - a calculation-area designating unit constructed to designate, based on a command from a user, an area to be used in the calculation of the degree of similarity from among the areas obtained by segmentation by said segmentation unit;
      
      wherein if an area is designated by said calculation-area designating unit, only the similarity calculation unit or units which calculate the degree of similarity for the area designated by said calculation-area designating unit from among said first, second and third similarity calculation units calculates the degree of similarity.
  - 5. The apparatus according to claim 1, further comprising:
    - an area designating unit constructed to designate, based on a command from a user, an area to be emphasized by the user from among the areas obtained by segmentation by said segmentation unit;
      
      wherein said calculation unit increases the weighting the degree or degrees of similarity for the area designated by said area designating unit from among the acquired first, second and third degrees of similarity and then calculates the overall degree of similarity.
  - 6. The apparatus according to claim 1, further comprising a conversion unit constructed to convert the input document image to vector data if the overall degree of similarity that has been calculated by said calculation unit is equal to or less than a predetermined value.
  - 7. The apparatus according to claim 6, wherein said conversion unit includes a character recognition unit constructed to recognize characters in the input document image.
  - 8. The apparatus according to claim 6, wherein said conversion unit converts the input document image to vector data for every area obtained by segmentation by said segmentation unit.

9. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas;
  
  a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step;
  
  an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step;
  
  an acquisition step of acquiring, for each image file, the first, second and third degrees of similarity calculated in said first, second and third similarity calculation steps;
  
  a calculation step of calculating an overall degree of similarity for every image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired in said acquisition step for each image file; and
  
  a display step of displaying a second plurality of image files acquired based upon the calculated overall degrees of similarity, and of displaying information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The method according to claim 9, wherein information relating to the overall degree of similarity calculated for every displayed image file and the information which represents the type of similarity calculation used for calculating the overall degree of similarity are sorted and displayed, in order of decreasing overall degree of similarity, in said display step.
  - 11. The method according to claim 9, wherein the overall degree of similarity calculated for every image file is displayed in said display step in the form of a graph.
  - 12. The method according to claim 9, further comprising:
    - a calculation-area designating step of designating, based on a command from a user, an area to be used in the calculation of the degree of similarity for each of the areas obtained by segmentation in said segmentation step;
      
      wherein if an area is designated in said calculation-area designating step, the degree of similarity is calculated in only the similarity calculation step or steps which calculate the degree of similarity for the area designated in said calculation-area designating step from among said first, second and third similarity calculation steps.
  - 13. The method according to claim 9, further comprising:
    - an area designating step of designating, based on a command from a user, an area to be emphasized by the user from among the areas obtained by segmentation in said segmentation step;
      
      wherein weighting for the degree or degrees of similarity for the area designated in said area designating step from among the acquired first, second and third degrees of similarity is increased in said calculation step and then the overall degree of similarity is calculated in said calculation step.
  - 14. The method according to claim 9, further comprising a conversion step of converting the input document image to vector data if the overall degree of similarity that has been calculated in said calculation step is equal to or less than a predetermined value.
  - 15. The method according to claim 14, wherein said conversion step includes a character recognition step of recognizing characters in the input document image.
  - 16. The method according to claim 14, wherein the input document image is converted in said conversion step to vector data for each area obtained by segmentation in said segmentation step.
  - 17. A storage medium storing a control program for causing the information processing method set forth in claim 9 to be implemented by a computer.

18. An information processing apparatus for retrieving an image file similar to an input document image from a plurality of image files, comprising:
- a memory for storing the input document image;
  
  a segmentation unit constructed to segment the input document image into text areas and image areas;
  
  a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit;
  
  an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit;
  
  an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second and third similarity calculation units;
  
  a calculation unit constructed to calculate an overall degree of similarity for every image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file; and
  
  a display unit constructed to sort and display the plurality of image files, the overall degrees of similarity corresponding to image files, and information which represents the type of similarity calculation used for calculating each overall degree of similarity, in order of decreasing calculated overall degree of similarity.

19. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a memory for storing the input document image;
  
  a segmentation unit constructed to segment the input document image into text areas and image areas;
  
  a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit;
  
  an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit;
  
  a calculation-area designating unit constructed to designate, based on a command from a user, an area to be used in the calculation of the degree of similarity from among the areas obtained by segmentation by said segmentation unit;
  
  an acquisition unit constructed to acquire, for each image file, the degree of similarity calculated by each similarity calculation unit which calculates the degree of similarity for the area designated by said calculation-area designating unit from among said first, second and third similarity calculation units;
  
  a calculation unit constructed to calculate an overall degrees of similarity for each image file by weighting, on the basis of the first, second and third priority information, each degree of similarity which has been acquired by said acquisition unit; and
  
  a display unit constructed to display a second plurality of image files acquired based on the calculated overall degree of similarity and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.

20. An information processing apparatus for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a memory for storing the input document image;
  
  a segmentation unit constructed to segment the input document image into text areas and image areas;
  
  a first similarity calculation unit constructed to calculate a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation unit applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a second similarity calculation unit constructed to calculate a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation unit applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation by said segmentation unit;
  
  a third similarity calculation unit constructed to calculate a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation unit applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation by said segmentation unit;
  
  an input unit constructed to input first, second and third priority information for weighting the first, second and third degrees of similarity calculated by each of said first, second and third similarity calculation units, wherein the first, second and third priority information respectively correspond to each similarity calculation unit and are input using said input unit;
  
  an area designating unit constructed to designate, based on a command from a user, an area to be emphasized by the user from among the areas obtained by segmentation by said segmentation unit;
  
  an acquisition unit constructed to acquire, for each image file, the first, second and third degrees of similarity calculated by said first, second and third similarity calculation units;
  
  an calculation unit constructed to calculate an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired by said acquisition unit for each image file, and constructed to increase weighting of each degree of similarity for the area designated by said area designating unit; and
  
  a display unit constructed to display a second plurality of image files acquired based on the calculated overall degrees of similarity and constructed to display information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.

21. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas;
  
  a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step;
  
  an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step;
  
  a acquisition step of acquiring, for each image file, the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps;
  
  a calculation step of calculating an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which have been acquired in said acquisition step for each image file; and
  
  a display step of sorting and displaying the plurality of image files, the overall degrees of similarity corresponding to the image files, and information which represents the type of similarity calculation used for calculating each overall degree of similarity, in order of decreasing calculated overall degree of similarity.

22. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas;
  
  a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step;
  
  an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step;
  
  a calculation-area designating step of designating, based on a command from a user, an area to be used in the calculation of the degree of similarity from among the areas obtained by segmentation in said segmentation step;
  
  a acquisition step of acquiring, for each image file, the degree of similarity calculated in each similarity calculation step which calculates the degree of similarity for the area designated in said calculation-area designating step from among said first, second and third similarity calculation steps;
  
  a calculation step of calculating an overall degree of similarity for each image file by weighting, on the basis of the first, second and third priority information, each degree of similarity which has been acquired in said acquisition step for each image file; and
  
  a display step of displaying a second plurality of image files acquired based on the calculated overall degrees of similarity and of displaying information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.

23. An information processing method for retrieving image files similar to an input document image from a plurality of image files, comprising:
- a segmentation step of segmenting the input document image, by a data processor, into text areas and image areas;
  
  a first similarity calculation step of calculating a first degree of similarity for text areas included in the plurality of image files, wherein the first similarity calculation step applies a first type of similarity calculation which uses all of text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a second similarity calculation step of calculating a second degree of similarity for text areas included in the plurality of image files, wherein the second similarity calculation step applies a second type of similarity calculation which uses part of the text data extracted by character recognition from each of the text areas obtained by segmentation in said segmentation step;
  
  a third similarity calculation step of calculating a third degree of similarity for image areas included in the plurality of image files, wherein the third similarity calculation step applies a third type of similarity calculation which uses a feature extracted from each of the image areas obtained by segmentation in said segmentation step;
  
  an input step of inputting first, second and third priority information for weighting the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps, wherein the first, second and third priority information respectively correspond to each similarity calculation step and are input in said input step;
  
  an area designating step of designating, based upon a command from a user, an area to be emphasized by the user from among the areas obtained by segmentation in said segmentation step;
  
  a acquisition step of acquiring, for each image file, the first, second and third degrees of similarity calculated in each of said first, second and third similarity calculation steps;
  
  a calculation step of calculating overall degrees of similarity for each image file by weighting, on the basis of the first, second and third priority information, each of the first, second and third degrees of similarity which has been acquired in said acquisition step for each image file and of increasing the weighting of each degree of similarity for the area designated in said area designating step; and
  
  a display step of displaying a second plurality of retrieved image files acquired based on the calculated overall degree of similarity and of displaying information which represents the type of similarity calculation used for calculating the overall degree of similarity for each of the second plurality of image files.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Kaneda, Kitahiro
Primary Examiner(s)
Mofiz; Apu M
Assistant Examiner(s)
Le; Jessica N

Application Number

US10/832,400
Publication Number

US 20040220962A1
Time in Patent Office

1,876 Days
Field of Search

707/1, 707 3-102, 382/224, 395/600, 706/14, 345/418, 348/231.2
US Class Current

1/1
CPC Class Codes

G06F 16/5838   using colour

G06F 16/5846   using extracted text

G06F 16/5854   using shape and object rela...

Y10S 707/99936   Pattern matching access

Y10S 707/99943   Generating database or data...

Y10S 707/99945   Object-oriented database st...

Calculating image similarity using extracted data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

34 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Calculating image similarity using extracted data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links