Methods and apparatus to improve feature engineering efficiency with metadata unit operations
First Claim
1. A computer-implemented method to apply pattern engineering with metadata-driven unit operations to improve an efficiency of analysis of log files having different file formats, the computer-implemented method comprising:
- determining, by executing an instruction with a processor, a first file format of a log file from the different file formats, the log file to be converted to a vector output file;
generating, by executing an instruction with the processor, a first sequence of processing tasks based on the first file format, the first sequence including a first metadata tag corresponding to a conversion task to convert the first file format to a string format, the log file including pattern occurrence data;
generating, by executing an instruction with the processor, a second metadata tag within the first sequence of processing tasks, the second metadata tag associated with a second processing task to identify respective features from the pattern occurrence data by comparing the string to a pattern corresponding to malware;
generating, by executing an instruction with the processor, a third metadata tag within the first sequence of processing tasks, the third metadata tag associated with a third processing task to create the vector output file of the respective features from the pattern occurrence data identified by the second metadata tag; and
executing, by executing an instruction with the processor, the first sequence of the first metadata tag, the second metadata tag, and the third metadata tag to create the vector output file associated with the first file format.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, apparatus, systems and articles of manufacture are disclosed to improve feature engineering efficiency. An example method disclosed herein includes retrieving a log file in a first file format, the log file containing feature occurrence data, generating a first unit operation based on the first file format to extract the feature occurrence data from the log file to a string, the first unit operation associated with a first metadata tag, generating second unit operations to identify respective features from the feature occurrence data, the second unit operations associated with respective second metadata tags, and generating a first sequence of the first metadata tag and the second metadata tags to create a first vector output file of the feature occurrence data.
14 Citations
20 Claims
-
1. A computer-implemented method to apply pattern engineering with metadata-driven unit operations to improve an efficiency of analysis of log files having different file formats, the computer-implemented method comprising:
-
determining, by executing an instruction with a processor, a first file format of a log file from the different file formats, the log file to be converted to a vector output file; generating, by executing an instruction with the processor, a first sequence of processing tasks based on the first file format, the first sequence including a first metadata tag corresponding to a conversion task to convert the first file format to a string format, the log file including pattern occurrence data; generating, by executing an instruction with the processor, a second metadata tag within the first sequence of processing tasks, the second metadata tag associated with a second processing task to identify respective features from the pattern occurrence data by comparing the string to a pattern corresponding to malware; generating, by executing an instruction with the processor, a third metadata tag within the first sequence of processing tasks, the third metadata tag associated with a third processing task to create the vector output file of the respective features from the pattern occurrence data identified by the second metadata tag; and executing, by executing an instruction with the processor, the first sequence of the first metadata tag, the second metadata tag, and the third metadata tag to create the vector output file associated with the first file format. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus to apply pattern engineering with metadata-driven metadata tags to improve an efficiency of analysis of log files having different file formats, the apparatus comprising:
-
a log file retriever to; retrieve a log file in a first log file format, the log file including pattern occurrence data; and determine a first file format of the log file from the different file formats, the log file to be converted to a vector output file; a file to string operation builder to generate a first metadata tag of a first sequence of processing tasks that is based on the first file format, the first metadata tag corresponding to a conversion task to convert the first file format to a string format; an extraction operation builder to generate a second metadata tag within the first sequence of processing tasks, the second metadata tag associated with a second processing task to identify respective features from the pattern occurrence data by comparing the string to a pattern corresponding to malware; a feature save operation builder to generate a third metadata tag within the first sequence of processing tasks, the third metadata tag associated with a third processing task to save the vector output file; and an operation flow builder to; generate the first sequence of processing tasks, the first sequence including the first metadata tag, the second metadata tag, and the third metadata tag to create the vector output file of the respective features from the pattern occurrence data identified by the second metadata tag; and execute the first sequence of the first metadata tag, the second metadata tag, and the third metadata tag to create the vector output file associated with the first file format. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A tangible computer readable storage medium comprising computer readable instructions to improve an efficiency of analysis of log files having different file formats, the instructions which, when executed, cause a processor to at least:
-
determine a first file format of a log file from the different file formats, the log file to be converted to a vector output file; generate a first sequence of processing tasks based on the first file format, the first sequence including a first metadata tag corresponding to a conversion task to convert the first file format to a string format, the log file including pattern occurrence data; generate a second metadata tag within the first sequence of processing tasks, the second metadata tag associated with a second processing task to identify respective features from the pattern occurrence data by comparing the string to a pattern corresponding to malware; generate a third metadata tag within the first sequence of processing tasks, the third metadata tag associated with a third processing task to create the vector output file of the respective features from the pattern occurrence data identified by the second metadata tag; and execute the first sequence of the first metadata tag, the second metadata tag, and the third metadata tag to create the vector output file associated with the first file format. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification