Section extraction tool for PDF documents
First Claim
1. A method of extracting a section of a page from a portable document format file (“
- pdf”
) comprising;
receiving indication of a user-defined region on a pdf file page;
determining if one or more elements on the pdf page are within the user-defined region;
designating an extraction region including all elements determined to be within the user-defined region; and
placing the extraction region into a new file.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of extracting a section of a page from a portable document format file (“pdf”): The method includes receiving indication of a user-defined region on a pdf file page, designating an extraction region including all elements determined to be within the user-defined region, and placing the extraction region into a new file. The method may also include determining if one or more elements on the pdf page are within the user-defined region by applying inclusion rules based on whether an element'"'"'s bounding box is within or intersects the user-defined region. The method may also include verifying the accuracy of the extraction by converting the user-defined region in the original pdf document and the extracted region to bitmap images and comparing the two bitmap images, bit by bit.
44 Citations
20 Claims
-
1. A method of extracting a section of a page from a portable document format file (“
- pdf”
) comprising;receiving indication of a user-defined region on a pdf file page;
determining if one or more elements on the pdf page are within the user-defined region;
designating an extraction region including all elements determined to be within the user-defined region; and
placing the extraction region into a new file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
including a graphic element within the extraction region if a bounding box of the graphic element is within the user-defined region; and
including an image element within the extraction region if a bounding box of the image element is within the user-defined region.
- pdf”
-
5. The method of claim 2, wherein applying the extraction determination rules comprises:
-
including a text element within the extraction region if a bounding box of the text element is within the user-defined region;
evaluating if sub-elements of the text element are within the user-defined region if the text element intersects the user-defined region;
including a sub-element of the text element if the sub-element is within the user-defined region; and
expanding the user-defined region to include a sub-element of the text element if the sub-element of the text element intersects the user-defined region.
-
-
6. The method of claim 1, further comprising verifying the accuracy of the extracted user-defined region in the new file.
-
7. The method of claim 6, wherein verifying the accuracy of the extracted user-defined region in the new file comprises converting the pdf file page into a first bitmap image and the extracted user-defined region in the new file into a second bitmap image and comparing the first bitmap image to the second bitmap image bit by bit to confirm the accuracy of the extraction.
-
8. The method of claim 7, further comprising presenting the user with a message regarding differences between the pdf file page and the extracted user-defined region in the new file if there is a difference between the first bitmap image and the second bitmap image.
-
9. The method of claim 1, wherein receiving the indication of the user-defined region on the pdf file page comprises receiving an input of a user-defined region drawn on the pdf file page.
-
10. The method of claim 1 wherein receiving the indication of the user-defined region comprises receiving an user selection of a button on the pdf screen after the user draws the user-defined region on the pdf file page.
-
11. The method of claim 1 wherein the new file comprises one of a portable document format file and a desktop publishing software file.
-
12. A system for extracting a section of a page of a portable document format file comprising:
-
means for receiving indication of a user-defined region on a pdf file page;
means for determining one or more elements on the pdf page are within the user-defined region;
means for designating an extraction region including all elements determined to be within the user-defined region; and
means for placing the extraction region into a new file. - View Dependent Claims (13, 14, 15, 16, 17, 18)
means for including a graphic element within the extraction region if a bounding box of the graphic element is within the user-defined region; and
means for including an image element within the extraction region if a bounding box of the image element is within the user-defined region.
-
-
15. The system of claim 13, wherein the means for applying the extraction determination rules comprises:
-
means for including a text element within the extraction region if a bounding box of the text element is within the user-defined region;
means for evaluating if sub-elements of the text element are within the user-defined region if the text element intersects the user-defined region;
means for including a sub-element of the text element if the sub-element is within the user-defined region; and
means for expanding the user-defined region to include a sub-element of the text element if the sub-element of the text element intersects the user-defined region.
-
-
16. The system of claim 12 further comprising:
means for verifying the accuracy of the extracted user-defined region in the new file.
-
17. The system of claim 16, wherein the means for verifying the accuracy of the extracted user-defined region in the new file comprises means for converting the pdf file page into a first bitmap image and the extracted user-defined region in the new file into a second bitmap image and means for comparing the first bitmap image to the second bitmap image bit by bit to confirm the accuracy of the extraction.
-
18. The system of claim 17, further comprising means for presenting the user with a message regarding differences between the pdf file page and the extracted user-defined region in the new file if there is a difference between the first bitmap image and the second bitmap image.
-
19. A computer readable medium containing executable instructions which, when executed in a processing system, cause the system to perform a method comprising:
-
receiving indication of a user-defined region on a pdf file page;
determining if one or more elements on the pdf page are within the user-defined region;
designating an extraction region including all elements determined to be within the user-defined region; and
placing the extraction region into a new file. - View Dependent Claims (20)
-
Specification