Identifying Matching Canonical Documents Consistent with Visual Query Structural Information
First Claim
1. A computer-implemented method of processing a visual query performed by a server system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:
- at the server system;
receiving a visual query from a client system distinct from the server system;
performing optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters including a plurality of textual characters in a contiguous region of the visual query, and structural information associated with the plurality of textual characters in the contiguous region of the visual query;
scoring each textual character in the plurality of textual characters;
identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query;
retrieving a canonical document that includes the one or more high quality textual strings and that is consistent with the structural information; and
sending at least a portion of the canonical document to the client system.
2 Assignments
0 Petitions
Accused Products
Abstract
A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.
-
Citations
21 Claims
-
1. A computer-implemented method of processing a visual query performed by a server system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:
at the server system; receiving a visual query from a client system distinct from the server system; performing optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters including a plurality of textual characters in a contiguous region of the visual query, and structural information associated with the plurality of textual characters in the contiguous region of the visual query; scoring each textual character in the plurality of textual characters; identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieving a canonical document that includes the one or more high quality textual strings and that is consistent with the structural information; and sending at least a portion of the canonical document to the client system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A server system, for processing a visual query, comprising:
-
one or more central processing units for executing programs; memory storing one or more programs be executed by the one or more central processing units; the one or more programs comprising instructions for; receiving a visual query from a client system; performing optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters including a plurality of textual characters in a contiguous region of the visual query, and structural information associated with the plurality of textual characters in the contiguous region of the visual query; scoring each textual character in the plurality of textual characters; identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieving a canonical document that includes the one or more high quality textual strings consistent with the structural information; and sending at least a portion of the canonical document to the client system. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
receiving a visual query from a client system; performing optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters including a plurality of textual characters in a contiguous region of the visual query, and structural information associated with the plurality of textual characters in the contiguous region of the visual query; scoring each textual character in the plurality of textual characters; identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query; retrieving a canonical document that includes the one or more high quality textual strings consistent with the structural information; and sending at least a portion of the canonical document to the client system. - View Dependent Claims (18, 19, 20, 21)
-
Specification