Method and system for extracting alphanumeric content from noisy image data
First Claim
Patent Images
1. A non-transitory computer readable medium comprising computer readable program code, which when executed by a computer processor enables the computer processor to:
- obtain a data model for content of a document;
generate a matching graph using a fuzzy regular expression, wherein the matching graph comprises a matching subgraph node for each component of the fuzzy regular expression;
process the data model using the matching graph to obtain matching results; and
provide the matching results to a user computing system,wherein processing the data model comprises;
generating an observation graph using the data model; and
determining a best path in the matching graph using observation nodes in the observation graph to transition between states of the matching graph,wherein at least one matching subgraph node comprises an exact match state and a wildcard state,wherein the matching graph comprises a plurality of transitions, the plurality of transitions associated with a plurality of weights having various values, the plurality of weights comprising a weight associated with a transition between the wildcard state and the exact match state, the weight associated with the transition between the wildcard state and the exact match state determined based on a transition type of the transition between the wildcard state and the exact match state.
2 Assignments
0 Petitions
Accused Products
Abstract
In general, embodiments of the technology relate to extracting content from documents. More specifically, embodiments of the technology relate to using fuzzy regular expressions to process content results obtained from one or more documents in order to extract content for these documents. Further, embodiments of the technology enable the format modification of the content after the content has been identified and extracted from the documents.
4 Citations
21 Claims
-
1. A non-transitory computer readable medium comprising computer readable program code, which when executed by a computer processor enables the computer processor to:
-
obtain a data model for content of a document; generate a matching graph using a fuzzy regular expression, wherein the matching graph comprises a matching subgraph node for each component of the fuzzy regular expression; process the data model using the matching graph to obtain matching results; and provide the matching results to a user computing system, wherein processing the data model comprises; generating an observation graph using the data model; and determining a best path in the matching graph using observation nodes in the observation graph to transition between states of the matching graph, wherein at least one matching subgraph node comprises an exact match state and a wildcard state, wherein the matching graph comprises a plurality of transitions, the plurality of transitions associated with a plurality of weights having various values, the plurality of weights comprising a weight associated with a transition between the wildcard state and the exact match state, the weight associated with the transition between the wildcard state and the exact match state determined based on a transition type of the transition between the wildcard state and the exact match state. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method comprising:
-
obtaining, by a computer processor, optical character recognition (OCR) results; processing, by the computer processor, the OCR results using a matching graph to determine matching results, the matching graph comprising; a plurality of transitions between a plurality of states, the plurality of transitions associated with a plurality of weights having various values, the plurality of weights including a weight associated with a transition between a wildcard state and an exact match state determined based on a transition type of the transition between the wildcard state and the exact match state, wherein processing the OCR results using the matching graph comprises determining a best path in the matching graph; and providing the matching results to a user computing system. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
a computer processor; a non-transitory computer readable medium coupled to the computer processor, the non-transitory computer readable medium comprising computer readable program code executable by the computer processor to; obtain optical character recognition (OCR) results; process the OCR results using a matching graph to obtain matching results, the matching graph comprising a plurality of transitions between a plurality of states, the plurality of transitions associated with a plurality of weights having various values, the plurality of weights including a weight associated with a transition between a wildcard state and an exact match state determined based on a transition type of the transition between the wildcard state and the exact match state, wherein processing the OCR results using the matching graph comprises determining a best path in the matching graph; and provide the matching results to a user computing system. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
Specification