PDF EXTRACTION WITH TEXT-BASED KEY
First Claim
1. A computing device comprising:
- an electronic processor; and
a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor toreceive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report,determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction,determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys,sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction,determine a single word from the one or more words that is directly adjacent to the first text-based key, andcontrol a display to display the single word that is directly adjacent to the first text-based key.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure includes a computing device for extracting information from a standardized PDF report in a non-paragraph format. In one embodiment, the computing device includes an electronic processor, and a memory. The memory includes program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF report and a configuration file, determine X coordinates and Y coordinates of bounding boxes associated with one or more text-based keys, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes based on respective X coordinates, determine a single word that is directly adjacent to the first text-based key, and control a display to display the single word.
-
Citations
23 Claims
-
1. A computing device comprising:
-
an electronic processor; and a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report, determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction, determine a single word from the one or more words that is directly adjacent to the first text-based key, and control a display to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a display device; and a server communicatively connected to the display device, the server including an electronic processor; and a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report, determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction, determine a single word from the one or more words that is directly adjacent to the first text-based key, and control a display to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium comprising instructions that, when executed by an electronic processor, cause the electronic processor to perform a set of operations comprising:
-
receiving a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report; determining X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction; determining one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys; sorting the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction; determining a single word from the one or more words that is directly adjacent to the first text-based key; and controlling a display to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A method for extracting information from a standardized PDF (portable document format) report that is in a non-paragraph format, the method comprising:
-
receiving, with an electronic processor, the standardized PDF report that is in the non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report; determining, with the electronic processor, X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction; determining, with the electronic processor, one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys; sorting, with the electronic processor, the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction; determining, with the electronic processor, a single word from the one or more words that is directly adjacent to the first text-based key; and controlling, with the electronic processor, a display to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (20, 21, 22, 23)
-
Specification