PDF extraction with text-based key
First Claim
1. A computing device comprising:
- an electronic processor; and
a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor toreceive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report,determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction,determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys,sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction,determine a single word from the one or more words that is directly adjacent to the first text-based key, andcontrol a display to display the single word that is directly adjacent to the first text-based key,wherein, to control the display to display the single word that is directly adjacent to the first text-based key, the program instructions, when executed by the electronic processor, further cause the electronic processor to generate a graphical user interface to display the single word that is directly adjacent to the first text-based key.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure includes a computing device for extracting information from a standardized PDF report in a non-paragraph format. In one embodiment, the computing device includes an electronic processor, and a memory. The memory includes program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF report and a configuration file, determine X coordinates and Y coordinates of bounding boxes associated with one or more text-based keys, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes based on respective X coordinates, determine a single word that is directly adjacent to the first text-based key, and control a display to display the single word.
5 Citations
20 Claims
-
1. A computing device comprising:
-
an electronic processor; and a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report, determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction, determine a single word from the one or more words that is directly adjacent to the first text-based key, and control a display to display the single word that is directly adjacent to the first text-based key, wherein, to control the display to display the single word that is directly adjacent to the first text-based key, the program instructions, when executed by the electronic processor, further cause the electronic processor to generate a graphical user interface to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (2, 3, 4)
-
-
5. A computing device comprising:
-
an electronic processor; and a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report, determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction, determine a single word from the one or more words that is directly adjacent to the first text-based key, and control a display to display the single word that is directly adjacent to the first text-based key, wherein the memory further includes a database, and wherein the program instructions, when executed by the electronic processor, further cause the electronic processor to store the single word that is directly adjacent to the first text-based key in the database. - View Dependent Claims (6)
-
-
7. A system comprising:
-
a display device; and a server communicatively connected to the display device, the server including an electronic processor; and a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report, determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction, determine a single word from the one or more words that is directly adjacent to the first text-based key, and control a display to display the single word that is directly adjacent to the first text-based key, wherein, to control the display to display the single word that is directly adjacent to the first text-based key, the program instructions that, when executed by the electronic processor, further cause the electronic processor to generate a graphical user interface to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (8, 9)
-
-
10. A system comprising:
-
a display device; and a server communicatively connected to the display device, the server including an electronic processor; and a memory coupled to the electronic processor, the memory including program instructions that, when executed by the electronic processor, cause the electronic processor to receive a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report, determine X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction, determine one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys, sort the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction, determine a single word from the one or more words that is directly adjacent to the first text-based key, and control a display to display the single word that is directly adjacent to the first text-based key, wherein the memory further includes a database, and wherein the program instructions that, when executed by the electronic processor, further cause the electronic processor to store the single word that is directly adjacent to the first text-based key in the database. - View Dependent Claims (11)
-
-
12. A non-transitory computer-readable medium comprising instructions that, when executed by an electronic processor, cause the electronic processor to perform a set of operations comprising:
-
receiving a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report; determining X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction; determining one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys; sorting the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction; determining a single word from the one or more words that is directly adjacent to the first text-based key; and controlling a display to display the single word that is directly adjacent to the first text-based key, wherein controlling the display to display the single word that is directly adjacent to the first text-based key further includes generating a graphical user interface to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (13, 14, 15)
-
-
16. A non-transitory computer-readable medium comprising instructions that, when executed by an electronic processor, cause the electronic processor to perform a set of operations comprising:
-
receiving a standardized PDF (portable document format) report that is in a non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report; determining X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction; determining one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys; sorting the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction; determining a single word from the one or more words that is directly adjacent to the first text-based key; controlling a display to display the single word that is directly adjacent to the first text-based key; and storing the single word that is directly adjacent to the first text-based key in a database.
-
-
17. A method for extracting information from a standardized PDF (portable document format) report that is in a non-paragraph format, the method comprising:
-
receiving, with an electronic processor, the standardized PDF report that is in the non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report; determining, with the electronic processor, X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction; determining, with the electronic processor, one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys; sorting, with the electronic processor, the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction; determining, with the electronic processor, a single word from the one or more words that is directly adjacent to the first text-based key; and controlling, with the electronic processor, a display to display the single word that is directly adjacent to the first text-based key, wherein controlling the display to display the single word that is directly adjacent to the first text-based key further includes generating a graphical user interface to display the single word that is directly adjacent to the first text-based key. - View Dependent Claims (18, 19)
-
-
20. A method for extracting information from a standardized PDF (portable document format) report that is in a non-paragraph format, the method comprising:
-
receiving, with an electronic processor, the standardized PDF report that is in the non-paragraph format and a configuration file including one or more values that correspond to one or more text-based keys in the standardized PDF report; determining, with the electronic processor, X coordinates and Y coordinates of bounding boxes associated with the one or more text-based keys, the X coordinates associated with an X-direction and the Y coordinates associated with a Y-direction; determining, with the electronic processor, one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with a first text-based key of the one or more text-based keys; sorting, with the electronic processor, the one or more words in the standardized PDF report that share the Y coordinates of the bounding boxes associated with the first text-based key based on respective X coordinates in the X-direction; determining, with the electronic processor, a single word from the one or more words that is directly adjacent to the first text-based key; controlling, with the electronic processor, a display to display the single word that is directly adjacent to the first text-based key; and storing the single word that is directly adjacent to the first text-based key in a database.
-
Specification