Global geographic information retrieval, validation, and normalization
First Claim
1. A computer-implemented method, comprising:
- capturing an image of a document using a camera of a mobile device;
performing optical character recognition (OCR) on the image of the document;
extracting an identifier of the document from the image based at least in part on the OCR;
comparing the identifier with content from one or more reference data sources, wherein the content from the one or more reference data sources comprises global address information; and
wherein the content from the one or more reference data sources is derived from geographic information organized in one or more of a proprietary address database and an open source address database; and
wherein deriving the content from the geographic information comprises;
obtaining the geographic information from one or more of the proprietary address database and an open source address database; and
parsing the geographic information according to a set of predefined heuristic rules, wherein the set of predefined heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information; and
determining whether the identifier is valid based at least in part on the comparison.
6 Assignments
0 Petitions
Accused Products
Abstract
According to one embodiment, a computer-implemented method includes: capturing an image of a document using a camera of a mobile device; performing optical character recognition (OCR) on the image of the document; extracting an identifier of the document from the image based at least in part on the OCR; comparing the identifier with content from one or more reference data sources, wherein the content from the one or more reference data sources comprises global address information; and determining whether the identifier is valid based at least in part on the comparison. The method may optionally include normalizing the extracted identifier, retrieving additional geographic information, correcting OCR errors, etc. based on comparing extracted information with reference content. Corresponding systems and computer program products are also disclosed.
-
Citations
18 Claims
-
1. A computer-implemented method, comprising:
-
capturing an image of a document using a camera of a mobile device; performing optical character recognition (OCR) on the image of the document; extracting an identifier of the document from the image based at least in part on the OCR; comparing the identifier with content from one or more reference data sources, wherein the content from the one or more reference data sources comprises global address information; and
wherein the content from the one or more reference data sources is derived from geographic information organized in one or more of a proprietary address database and an open source address database; and
wherein deriving the content from the geographic information comprises;obtaining the geographic information from one or more of the proprietary address database and an open source address database; and parsing the geographic information according to a set of predefined heuristic rules, wherein the set of predefined heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information; and determining whether the identifier is valid based at least in part on the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product, comprising a non-transitory computer readable storage medium having stored/encoded thereon computer readable program instructions configured to cause a processor, upon execution thereof, to:
-
receive an image of a document; perform optical character recognition (OCR) on the image of the document; extract an identifier of the document from the image based at least in part on the OCR; compare the identifier with content from one or more reference data sources, wherein the content from the one or more reference data sources comprises global address information; and
wherein the content from the one or more reference data sources is derived from geographic information organized in one or more of a proprietary address database and an open source address database; and
wherein deriving the content from the geographic information comprises;obtaining the geographic information from one or more of the proprietary address database and an open source address database; and parsing the geographic information according to a set of predefined heuristic rules, wherein the set of predefined heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information; and determine whether the identifier is valid based at least in part on the comparison.
-
-
18. A computer-implemented method, comprising:
-
capturing an image using a camera of a mobile device; classifying the image as an image of a document, wherein the classifying comprises; generating a first feature vector representative of the document, based on analyzing the image; and comparing the first feature vector to a plurality of reference feature matrices; performing optical character recognition (OCR) on the image of the document; extracting an identifier of the document from the image based at least in part on the OCR; comparing the identifier with content from one or more reference data sources, wherein the content from the one or more reference data sources comprises global address information; and
wherein the content from the one or more reference data sources is derived from geographic information organized in one or more of a proprietary address database and an open source address database; and
wherein deriving the content from the geographic information comprises;obtaining the geographic information from one or more of the proprietary address database and an open source address database; and parsing the geographic information according to a set of predefined heuristic rules, wherein the set of predefined heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information; determining whether the identifier is valid based at least in part on the comparison; associating the image of the document with metadata descriptive of one or more of the document and information relating to the document; and storing the image of the document and the associated metadata to a memory of the mobile device.
-
Specification