Automatic extraction of indicators of compromise from multiple data sources accessible over a network
First Claim
1. A method comprising:
- configuring one or more web crawlers to obtain textual information from a plurality of data sources accessible over at least one network;
extracting terms likely to be associated with indicators of compromise from the obtained textual information;
filtering the extracted terms to identify terms corresponding to respective valid indicators of compromise;
generating links between the terms corresponding to the respective valid indicators of compromise;
converting the links and the corresponding terms into an output document in a specified indicator of compromise format;
transmitting the output document to an analyst device;
receiving feedback from the analyst device relating to the output document; and
adjusting at least one filter parameter of the filtering based at least in part on the received feedback;
wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
7 Assignments
0 Petitions
Accused Products
Abstract
A processing device in one embodiment comprises a processor coupled to a memory and is configured to direct one or more web crawlers to obtain textual information from a plurality of data sources accessible over at least one network, to extract terms likely to be associated with indicators of compromise from the obtained textual information, to filter the extracted terms to identify terms corresponding to respective valid indicators of compromise, to generate links between the terms corresponding to the respective valid indicators of compromise, and to convert the links and the corresponding terms into an output document in a specified indicator of compromise format. Feedback from an analyst device receiving the output document may be used to adjust a filter parameter of the extracted term filtering. Additionally or alternatively, one or more parameters of a network security system may be adjusted based at least in part on the output document.
6 Citations
20 Claims
-
1. A method comprising:
-
configuring one or more web crawlers to obtain textual information from a plurality of data sources accessible over at least one network; extracting terms likely to be associated with indicators of compromise from the obtained textual information; filtering the extracted terms to identify terms corresponding to respective valid indicators of compromise; generating links between the terms corresponding to the respective valid indicators of compromise; converting the links and the corresponding terms into an output document in a specified indicator of compromise format; transmitting the output document to an analyst device; receiving feedback from the analyst device relating to the output document; and adjusting at least one filter parameter of the filtering based at least in part on the received feedback; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
configuring one or more web crawlers to obtain textual information from a plurality of data sources accessible over at least one network; extracting terms likely to be associated with indicators of compromise from the obtained textual information; filtering the extracted terms to identify terms corresponding to respective valid indicators of compromise; generating links between the terms corresponding to the respective valid indicators of compromise; and converting the links and the corresponding terms into an output document in a specified indicator of compromise format; the method further comprising; parsing the obtained textual information to identify portions of the obtained textual information that include the terms likely to be associated with the indicators of compromise; wherein parsing the obtained textual information further comprises; parsing HTML code of each of one or more web pages into a plurality of different regions; determining a quantity of text in each of the different regions; and selecting one or more of the regions to be subject to the extracting based at least in part on the determined quantities of text; wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
-
-
14. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to configure one or more web crawlers to obtain textual information from a plurality of data sources accessible over at least one network; to extract terms likely to be associated with indicators of compromise from the obtained textual information; to filter the extracted terms to identify terms corresponding to respective valid indicators of compromise; to generate links between the terms corresponding to the respective valid indicators of compromise; to convert the links and the corresponding terms into an output document in a specified indicator of compromise format; to transmit the output document to an analyst device; to receive feedback from the analyst device relating to the output document; and to adjust at least one filter parameter of the filtering based at least in part on the received feedback. - View Dependent Claims (15, 16)
-
-
17. An apparatus comprising:
-
at least one processing device comprising a processor coupled to a memory; said at least one processing device being configured; to direct one or more web crawlers to obtain textual information from a plurality of data sources accessible over at least one network; to extract terms likely to be associated with indicators of compromise from the obtained textual information; to filter the extracted terms to identify terms corresponding to respective valid indicators of compromise; to generate links between the terms corresponding to the respective valid indicators of compromise; to convert the links and the corresponding terms into an output document in a specified indicator of compromise format; to transmit the output document to an analyst device; to receive feedback from the analyst device relating to the output document; and to adjust at least one filter parameter of the filtering based at least in part on the received feedback. - View Dependent Claims (18, 19, 20)
-
Specification