GLOBAL GEOGRAPHIC INFORMATION RETRIEVAL, VALIDATION, AND NORMALIZATION
First Claim
1. A computer program product, comprising a non-transitory computer readable storage medium having stored/encoded thereon computer readable program instructions configured to cause a processor, upon execution thereof, to:
- perform optical character recognition (OCR) on an image of a document;
extract an identifier of the document from the image based at least in part on the OCR;
compare at least portions of the identifier with content from one or more reference data sources; and
determine whether the identifier is valid based at least in part on the comparison;
wherein the content from the one or more reference data sources comprises global address information;
wherein the content from the one or more reference data sources is derived from geographic information organized in one or more of a proprietary address database and an open source address database; and
wherein deriving the content from the geographic information comprises;
obtaining the geographic information from one or more of the proprietary address database and an open source address database; and
parsing the geographic information according to a set of predefined heuristic rules, wherein the set of predefined heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer program product includes program instructions configured to cause a processor, to: perform optical character recognition (OCR) on an image of a document; extract an identifier of the document from the image based at least in part on the OCR; compare at least portions of the identifier with content from one or more reference data sources; and determine whether the identifier is valid based at least in part on the comparison. The content comprises global address information; while the content from the reference is derived from geographic information. Deriving the content from the geographic information includes: obtaining the geographic information; and parsing the geographic information according to a set of predefined heuristic rules, where the heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information.
-
Citations
20 Claims
-
1. A computer program product, comprising a non-transitory computer readable storage medium having stored/encoded thereon computer readable program instructions configured to cause a processor, upon execution thereof, to:
-
perform optical character recognition (OCR) on an image of a document; extract an identifier of the document from the image based at least in part on the OCR; compare at least portions of the identifier with content from one or more reference data sources; and determine whether the identifier is valid based at least in part on the comparison; wherein the content from the one or more reference data sources comprises global address information; wherein the content from the one or more reference data sources is derived from geographic information organized in one or more of a proprietary address database and an open source address database; and wherein deriving the content from the geographic information comprises; obtaining the geographic information from one or more of the proprietary address database and an open source address database; and parsing the geographic information according to a set of predefined heuristic rules, wherein the set of predefined heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer program product, comprising a non-transitory computer readable storage medium having stored/encoded thereon computer readable program instructions configured to cause a processor, upon execution thereof, to:
-
capture an image using a camera of a mobile device; classify the image as an image of a document, wherein the classifying comprises; generating a first feature vector representative of the document, based on analyzing the image; and comparing the first feature vector to a plurality of reference feature matrices; perform optical character recognition (OCR) on the image of the document; extract an identifier of the document from the image based at least in part on the OCR; compare the identifier with content from one or more reference data sources; determine whether the identifier is valid based at least in part on the comparison; and in response to determining the identifier is valid; associating the image of the document with metadata descriptive of one or more of the document and information relating to the document; and storing the image of the document and the associated metadata to a memory of the mobile device; wherein the content from the one or more reference data sources comprises global address information; wherein the content from the one or more reference data sources is derived from geographic information organized in one or more of a proprietary address database and an open source address database; and wherein deriving the content from the geographic information comprises; obtaining the geographic information from one or more of the proprietary address database and an open source address database; and parsing the geographic information according to a set of predefined heuristic rules, wherein the set of predefined heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information.
-
Specification