Using extracted image text
First Claim
Patent Images
1. A computer-implemented method comprising:
- dividing an image into a plurality of sub-regions;
identifying two or more adjacent sub-regions of the image that contain text, wherein the identified adjacent sub-regions share overlapping image pixels;
combining the identified adjacent sub-regions into a candidate text region;
determining a minimum size for candidate text regions;
determining that the candidate text region is smaller than the determined minimum size for candidate text regions; and
based on determining that the candidate text region is smaller than the minimum size for candidate text regions, bypassing optical character recognition for the candidate text region.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus including computer program products for using extracted image text are provided. In one implementation, a computer-implemented method is provided. The method includes receiving an input of one or more image search terms and identifying keywords from the received one or more image search terms. The method also includes searching a collection of keywords including keywords extracted from image text, retrieving an image associated with extracted image text corresponding to one or more of the image search terms, and presenting the image.
54 Citations
33 Claims
-
1. A computer-implemented method comprising:
-
dividing an image into a plurality of sub-regions; identifying two or more adjacent sub-regions of the image that contain text, wherein the identified adjacent sub-regions share overlapping image pixels; combining the identified adjacent sub-regions into a candidate text region; determining a minimum size for candidate text regions; determining that the candidate text region is smaller than the determined minimum size for candidate text regions; and based on determining that the candidate text region is smaller than the minimum size for candidate text regions, bypassing optical character recognition for the candidate text region.
-
-
2. The method of claim 1 wherein identifying the adjacent sub-regions that contain text comprises:
-
extracting one or more features from each sub-region; and providing the extracted features to a trained classifier to determine whether each sub-region contains text.
-
-
3. The method of claim 2, wherein the classifier has been trained on images of city street scenes, and wherein each image of a city street scene includes identified regions of text and regions of non-text.
-
4. The method of claim 2, further comprising:
-
scaling the image to multiple sizes prior to extracting the one or more features; and determining that the sub-region contains text at two or more sizes.
-
-
5. The method of claim 2, wherein extracting features from each sub-region includes detecting corner features within the sub-region.
-
6. The method of claim 2, wherein extracting features from each sub-region includes computing projection profiles in each sub-region.
-
7. The method of claim 1, further comprising:
-
determining that a different second candidate text region is larger than the minimum size for candidate text regions; and in response to determining that the different second candidate text region is larger than the minimum size for candidate text regions, performing optical character recognition for the second candidate text region.
-
-
8. The method of claim 1, further comprising:
-
receiving ranging data associated with the image; generating a planar map of the image using the ranging data; and determining that the candidate text region of the image is not located on a single plane, wherein bypassing optical character recognition for the candidate text region is further based on determining that the candidate text region of the image is not located on a single plane.
-
-
9. The method of claim 8, wherein the ranging data comprises a distance from a camera position when the image was taken to an object shown in the image.
-
10. The method of claim 8, wherein generating a planar map comprises decomposing the image into planar and non-planar regions, and wherein determining that the candidate text region is not located on a single plane is based on determining that the candidate text region corresponds to a non-planar region.
-
11. The method of claim 1, further comprising:
correcting perspective distortion in the image.
-
12. The method of claim 1, wherein determining a minimum size for candidate text regions comprises:
-
determining a number of false positive results based on the minimum size for candidate text regions; determining that the number of false positive results satisfies a threshold; and in response to determining that the number of false positive results satisfies a threshold, determining an increased minimum size for candidate text regions.
-
-
13. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; dividing an image into a plurality of sub-regions; identifying two or more adjacent sub-regions of the image that contain text, wherein the identified adjacent sub-regions share overlapping image pixels; combining the identified adjacent sub-regions into a first candidate text region; determining a minimum size for candidate text regions; determining that the candidate text region is smaller than the determined minimum size for candidate text regions; and based on determining that the candidate text region is smaller than the minimum size for candidate text regions, bypassing optical character recognition for the candidate text region.
-
-
14. The system of claim 13 wherein identifying the adjacent sub-regions that contain text comprises:
-
extracting one or more features from each sub-region; and providing the extracted features to a trained classifier to determine whether each sub-region contains text.
-
-
15. The system of claim 14, wherein the classifier has been trained on images of city street scenes, and wherein each image of a city street scene includes identified regions of text and regions of non-text.
-
16. The system of claim 14, wherein the operations further comprise:
-
scaling the image to multiple sizes prior to extracting the one or more features; and determining that the sub-region contains text at two or more sizes.
-
-
17. The system of claim 14, wherein extracting features from each sub-region includes detecting corner features within the sub-region.
-
18. The system of claim 14, wherein extracting features from each sub-region includes computing projection profiles in each sub-region.
-
19. The system of claim 13, wherein the operations further comprise:
-
determining that a different second candidate text region is larger than the minimum size for candidate text regions; and in response to determining that the different second candidate text region is larger than the minimum size for candidate text regions, performing optical character recognition for the second candidate text region.
-
-
20. The system of claim 13, wherein the operations further comprise:
-
receiving ranging data associated with the image; generating a planar map of the image using the ranging data; and determining that the candidate text region of the image is not located on a single plan, wherein bypassing optical character recognition for the candidate text region is further based on determining that the candidate text region of the image is not located on a single plane.
-
-
21. The system of claim 20, wherein generating a planar map comprises decomposing the image into planar and non-planar regions, and wherein determining that the candidate text region is not located on a single plane is based on determining that the candidate text region corresponds to a non-planar region.
-
22. The system of claim 13, wherein determining a minimum size for candidate text regions comprises:
-
determining a number of false positive results based on the minimum size for candidate text regions; determining that the number of false positive results satisfies a threshold; and in response to determining that the number of false positive results satisfies a threshold, determining an increased minimum size for candidate text regions.
-
-
23. A non-transitory computer readable medium comprising instructions that, when executed by one or more data processing apparatus, cause the one or more data processing apparatus to perform operations comprising:
-
dividing an image into a plurality of sub-regions; identifying two or more adjacent sub-regions of the image that contain text, wherein the identified adjacent sub-regions share overlapping image pixels; combining the identified adjacent sub-regions into a candidate text region;
determining a minimum size for candidate text regions;determining that the candidate text region is smaller than the determined minimum size for candidate text regions; and based on determining that the candidate text region is smaller than the minimum size for candidate text regions, bypassing optical character recognition for the candidate text region.
-
-
24. The computer readable medium of claim 23 wherein identifying the adjacent sub-regions that contain text comprises:
-
extracting one or more features from each sub-region; and providing the extracted features to a trained classifier to determine whether each sub- region contains text.
-
-
25. The computer readable medium of claim 24, wherein the classifier has been trained on images of city street scenes, and wherein each image of a city street scene includes identified regions of text and regions of non-text.
-
26. The computer readable medium of claim 24, wherein the operations further comprise:
-
scaling the image to multiple sizes prior to extracting the one or more features; and determining that the sub-region contains text at two or more sizes.
-
-
27. The computer readable medium of claim 24, wherein extracting features from each sub-region includes detecting corner features within the sub-region.
-
28. The computer readable medium of claim 24, wherein extracting features from each sub-region includes computing projection profiles in each sub-region.
-
29. The computer readable medium of claim 23, wherein the operations further comprise:
-
determining a minimum size for text in candidate text regions; determining that a different second candidate text region is larger than the minimum size for candidate text regions; and in response to determining that the different second candidate text region is larger than the minimum size for candidate text regions, performing optical character recognition for the second candidate text region.
-
-
30. The computer readable medium of claim 23, wherein the operations further comprise:
-
receiving ranging data associated with the image; generating a planar map of the image using the ranging data; and determining that the candidate text region of the image is not located on a single plane, wherein bypassing optical character recognition for the candidate text region is further based on determining that the candidate text region of the image is not located on a single plane.
-
-
31. The computer readable medium of claim 30, wherein the ranging data comprises a distance from a camera position when the image was taken to an object shown in the image.
-
32. The computer readable medium of claim 30, wherein generating a planar map comprises decomposing the image into planar and non-planar regions, and wherein determining that the candidate text region is not located on a single plane is based on determining that the candidate text region corresponds to a non-planar region.
-
33. The computer readable medium of claim 23, wherein the operations further comprise:
correcting perspective distortion in the image.
Specification