Systems and methods of parsing receipts
First Claim
1. A method of parsing receipt information using a computer system, the method comprising:
- receiving, by the computer system, an image of a receipt;
requesting execution of an optical character recognition (OCR) component to convert the image to text;
accessing a plurality of regular expressions associated with a plurality of vendor elements;
identifying a string of characters in the text that match a character pattern specified by at least one regular expression of the plurality of regular expressions, wherein the at least one regular expression includes metacharacters;
capturing a value of a vendor element from the text, wherein the value of the vendor element comprises the string of characters that match the character pattern;
accessing reference data specifying additional elements associated with the value of the vendor element and information to identify values of the additional elements;
capturing the values of the additional elements in the text based on the reference data, wherein the act of capturing the values includes;
searching the text for regular expressions associated with the additional elements; and
locating the values of the additional elements using receipt format information associated with the additional elements that specifies locations for the values relative to the regular expressions; and
storing the value of the vendor element and the values of the additional elements in a data store.
7 Assignments
0 Petitions
Accused Products
Abstract
According to another aspect, a computer system is provided. The computer system includes a memory; at least one processor in data communication with the memory; an optical character recognition (OCR) component executable by the at least one processor; and a receipt parsing component executable by the at least processor. The receipt parsing component is configured to receive an image of a receipt; request execution of the OCR component to convert the image to text; identify a value of a vendor element in the text; identify values of additional elements in the text based on the value of the vendor element; and store the vendor elements and the additional elements in a data store.
-
Citations
9 Claims
-
1. A method of parsing receipt information using a computer system, the method comprising:
-
receiving, by the computer system, an image of a receipt; requesting execution of an optical character recognition (OCR) component to convert the image to text; accessing a plurality of regular expressions associated with a plurality of vendor elements; identifying a string of characters in the text that match a character pattern specified by at least one regular expression of the plurality of regular expressions, wherein the at least one regular expression includes metacharacters; capturing a value of a vendor element from the text, wherein the value of the vendor element comprises the string of characters that match the character pattern; accessing reference data specifying additional elements associated with the value of the vendor element and information to identify values of the additional elements; capturing the values of the additional elements in the text based on the reference data, wherein the act of capturing the values includes; searching the text for regular expressions associated with the additional elements; and locating the values of the additional elements using receipt format information associated with the additional elements that specifies locations for the values relative to the regular expressions; and storing the value of the vendor element and the values of the additional elements in a data store. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer readable medium storing sequences of computer executable instructions to implement a method for parsing receipt information, the sequences of instructions including instructions to:
-
receive an image of a receipt; request execution of an optical character recognition (OCR) component to convert the image to text; access a plurality of regular expressions associated with a plurality of vendor elements; identify a string of characters in the text that match a character pattern specified by at least one regular expression of the plurality of regular expressions, wherein the at least one regular expression includes metacharacters; capture a value of a vendor element from the text, wherein the value of the vendor element comprises the string of characters that match the character pattern; access reference data specifying additional elements associated with the value of the vendor element and information to identify values of the additional elements; capture the values of the additional elements in the text based on the reference data, wherein capturing the values of the additional elements includes; searching the text for regular expressions associated with the additional elements; and locating the values of the additional elements using receipt format information associated with the additional elements that specifies locations for the values relative to the regular expressions; and store the value of the vendor element and the values of the additional elements in a data store. - View Dependent Claims (6, 7)
-
-
8. A system comprising:
-
a memory; at least one processor in data communication with the memory; an optical character recognition (OCR) component executable by the at least one processor; and a receipt parsing component executable by the at least one processor and configured to; receive an image of a receipt; request execution of the OCR component to convert the image to text; access a plurality of regular expressions associated with a plurality of vendor elements; identify a string of characters in the text that match a character pattern specified by at least one regular expression of the plurality of regular expressions, wherein the at least one regular expression includes metacharacters; capture a value of a vendor element from the text, wherein the value of the vendor element comprises the string of characters that match the character pattern; access reference data specifying additional elements associated with the value of the vendor element and information to identify values of the additional elements; capture the values of the additional elements in the text based on the reference data, wherein capturing the values of the additional elements includes; searching the text for regular expressions associated with the additional elements; and locating the values of the additional elements using receipt format information associated with the additional elements that specifies locations for the values relative to the regular expressions; store the value of the vendor element and the values of the additional elements in a data store; wherein the receipt parsing component is further configured to;
identify a telephone number on the receipt and trigger a reverse telephone number lookup service to capture information on the vendor. - View Dependent Claims (9)
-
Specification