Automatic signature generation for malicious PDF files
First Claim
Patent Images
1. A system, comprising:
- a processor configured to;
parse a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; and
determine whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data;
in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF file; and
in the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables; and
a memory coupled to the processor and configured to provide the processor with instructions.
1 Assignment
0 Petitions
Accused Products
Abstract
In some embodiments, automatic signature generation for malicious PDF files includes: parsing a PDF file to extract script stream data embedded in the PDF file; determining whether the extracted script stream data within the PDF file is malicious; and automatically generating a signature for the PDF file.
-
Citations
26 Claims
-
1. A system, comprising:
-
a processor configured to; parse a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; and determine whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data; in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF file; and in the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables; and a memory coupled to the processor and configured to provide the processor with instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
-
parsing a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; and determining whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data; in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF; and in the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
parsing a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content; and determining whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data; in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF; and in the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generating the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables.
-
-
20. A system, comprising:
-
a processor configured to; determine that a PDF file does not include script stream data, wherein the PDF file is known to include malicious content; determine an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables; and automatically generate a signature for the PDF file from the identified cross-reference table; and a memory coupled to the processor and configured to provide the processor with instructions. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
Specification