Systems, methods, and computer readable media for extracting data from portable document format (PDF) files
DCFirst Claim
1. A method for extracting data from a portable document format (PDF) file, the method comprising:
- identifying at least one document identifier associated with a first document in a portable document format (PDF) file;
determining, using the at least one document identifier, a reference point identifier in the first document, an offset value for indicating a location of a first detection area in the first document, and size information for indicating a size of the first detection area in the first document,identifying, using the reference point identifier, the reference point in the first document;
identifying, using the offset value and the size information, the first detection area in the first document; and
extracting, by processing binary data of the PDF file, data within the first detection area of the first document.
5 Assignments
Litigations
0 Petitions
Accused Products
Abstract
According to one method, the method occurs at a data file analyzer. The method includes identifying at least one document identifier associated with a first document in a portable document format (PDF) file. The method further includes determining, using the at least one document identifier, a reference point identifier for identifying a reference point in the first document, an offset value for indicating a location of a first detection area in the first document, and size information for indicating a size of the first detection area in the first document. The method also includes identifying, using a reference point identifier, the reference point in the first document. The method further includes identifying, using the offset value and the size information, the first detection area in the first document and extracting, by processing binary data of the PDF file, data within the first detection area of the first document.
16 Citations
21 Claims
-
1. A method for extracting data from a portable document format (PDF) file, the method comprising:
-
identifying at least one document identifier associated with a first document in a portable document format (PDF) file; determining, using the at least one document identifier, a reference point identifier in the first document, an offset value for indicating a location of a first detection area in the first document, and size information for indicating a size of the first detection area in the first document, identifying, using the reference point identifier, the reference point in the first document; identifying, using the offset value and the size information, the first detection area in the first document; and extracting, by processing binary data of the PDF file, data within the first detection area of the first document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for extracting data from a portable document format (PDF) file, the system comprising:
-
a data file analyzer comprising; at least one processor; and a memory, wherein the data file analyzer is configured to identify at least one document identifier associated with a first document in a portable document format (PDF) file, to determine, using the at least one document identifier, a reference point identifier for identifying a reference point in the first document, an offset value for indicating a location of a first detection area in the first document, and size information for indicating a size of the first detection area in the first document, to identify, using the reference point identifier, the reference point in the first document, to identify, using the offset value and the size information, the first detection area in the first document, and to extract, by processing binary data of the PDF file, data within the first detection area of the first document. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer readable medium having stored thereon computer-executable instructions that when executed by at least one processor of a computer cause the computer to perform steps comprising:
-
identifying at least one document identifier associated with a first document in a portable document format (PDF) file; determining, using the at least one document identifier, a reference point identifier for identifying a reference point in the first document, an offset value for indicating a location of a first detection area in the first document, and size information for indicating a size of the first detection area in the first document, identifying, using the reference point identifier, the reference point in the first document; identifying, using the offset value and the size information, the first detection area in the first document; and extracting, by processing binary data of the PDF file, data within the first detection area of the first document.
-
Specification