Text recognition for search results
First Claim
1. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause a computing device to:
- obtain at least one image frame containing text captured using a camera of the computing device;
cause the text within the at least one image frame to be recognized with an optical character recognition (OCR) engine, an output of the OCR engine including recognized text strings and a score for each text string associated with a respective recognition confidence;
filter the recognized text strings from the output of the OCR engine that are at least one of a determined distance from an edge of the at least one image frame or are associated with at least two lines of the text to generate a set of text strings to generate a set of filtered text strings;
adjust the score of each text string as a function of distance from a center of the at least one image frame, the score associated with a text string near the center being adjusted upward relative to a text string closer to an edge of the at least one image frame;
rank the set of filtered text strings according to the score for each text string;
compare each text string of the set of filtered text strings to content references associated with content items;
identify a combined threshold indicating a number of matches or approximate matches to within an allowable deviation between the ranked set of filtered text strings and the content references, the content references including text strings corresponding to identifying features of the content;
submit the ranked set of filtered text strings associated with the identified number of matches or the approximate matches within the allowable deviation to a search engine to return content search results for the ranked set of filtered text strings;
compare the ranked set of filtered text strings to title text strings of the content search results; and
provide a respective content search result for display on the computing device when a number of text strings from the ranked set of filtered text strings appearing in a title text string of the respective content is at least equal to or exceeds the combined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments enable a process to automatically attempt to select the most relevant words associated with products available for purchase from an electronic marketplace from an image frame. For example, an image frame containing text can be obtained and analyzed with an optical character recognition. The recognized words can then be preprocessed using various filtering and scoring techniques to narrow down a volume of text to a few relevant query terms. These query terms can then be sent to a search engine associated with the electronic marketplace to return relevant products to a user.
-
Citations
19 Claims
-
1. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause a computing device to:
-
obtain at least one image frame containing text captured using a camera of the computing device; cause the text within the at least one image frame to be recognized with an optical character recognition (OCR) engine, an output of the OCR engine including recognized text strings and a score for each text string associated with a respective recognition confidence; filter the recognized text strings from the output of the OCR engine that are at least one of a determined distance from an edge of the at least one image frame or are associated with at least two lines of the text to generate a set of text strings to generate a set of filtered text strings; adjust the score of each text string as a function of distance from a center of the at least one image frame, the score associated with a text string near the center being adjusted upward relative to a text string closer to an edge of the at least one image frame; rank the set of filtered text strings according to the score for each text string; compare each text string of the set of filtered text strings to content references associated with content items; identify a combined threshold indicating a number of matches or approximate matches to within an allowable deviation between the ranked set of filtered text strings and the content references, the content references including text strings corresponding to identifying features of the content; submit the ranked set of filtered text strings associated with the identified number of matches or the approximate matches within the allowable deviation to a search engine to return content search results for the ranked set of filtered text strings; compare the ranked set of filtered text strings to title text strings of the content search results; and provide a respective content search result for display on the computing device when a number of text strings from the ranked set of filtered text strings appearing in a title text string of the respective content is at least equal to or exceeds the combined threshold. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method, comprising:
-
receiving an output from a character recognition engine; removing text strings from the output associated with descriptive text to generate a set of candidate text strings, the descriptive text including text strings associated with two or more lines of text from the output; comparing the set of candidate text strings to content references associated with content, the content references including text strings corresponding to identifying features of the content; determining a score for each candidate text string of the set of candidate text strings, the score for each candidate text string determined as a function of distance from a center of at least one image frame processed by the character recognition engine, the score associated with a candidate text string near the center being adjusted upward relative to a text string closer to an edge of the at least one image frame; identifying a number of semantically relevant candidate text strings of the set at least approximately matching at least one content reference text string to within an allowable deviation, the identified number of semantically relevant candidate text strings at least equaling or exceeding a combined threshold indicating a number of matches or approximate matches to within the allowable deviation; and submitting the identified number of semantically relevant candidate text strings for content search to at least one content database. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computing device, comprising:
-
a processor; an imagining element; a display element; and memory including instructions that, when executed by the processor, cause the computing device to; capture, using the imagining element, an image of text; cause the text within the image to be recognized with a character recognition engine; receive, from the character recognition engine, a score for each text string associated with the text, each score corresponding to a level of recognition confidence for a respective text string; adjust the score for each text string based on a function of distance from a center of the image, the score associated with a text string near the center being adjusted upward relative to a text string closer to an edge of the image; filter out at least a portion of the text that is at least one of a determined distance from an edge of the image or are associated with a determined volume of text; compare the text to content references associated with content, the content references including text strings corresponding to corresponding to identifying features of the content; identify a determined number of text strings matching content reference text strings at least equaling or exceeding a combined threshold, the combined threshold indicating a number of matches or approximate matches to within an allowable deviation; and submit the determined number of text strings to at least one content database. - View Dependent Claims (17, 18, 19)
-
Specification