Optimizing index file sizes based on indexed data storage conditions
First Claim
1. A method, comprising:
- receiving raw machine data at an indexer, the raw machine data generated by at least one component in an information technology environment and reflecting activity in the information technology environment;
processing, by the indexer, the raw machine data to generate one or more sets of events and an index file associated with at least one set of events of the one or more sets of events, the index file including a keyword portion associating a plurality of keywords with location references to events of the at least one set of events, wherein a keyword of the plurality of keywords is associated with at least one particular location reference to an event of the at least one set of events that includes the keyword;
storing, by the indexer, the at least one set of events and the index file in one or more data stores; and
based at least in part on one or more attributes of the at least one set of events satisfying one or more index size optimization conditions, removing at least a part of the keyword portion from the index file to reduce an amount of storage space used by the index file in the one or more data stores.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques and mechanisms are disclosed to optimize the size of index files to improve use of storage space available to indexers and other components of a data intake and query system. Index files of a data intake and query system may include, among other data, a keyword portion containing mappings between keywords and location references to event data containing the keywords. Optimizing an amount of storage space used by index files may include removing, modifying and/or recreating various components of index files in response to detecting one or more storage conditions related to the event data indexed by the index files. The optimization of index files generally may attempt to manage a tradeoff between an efficiency with which search requests can be processed using the index files and an amount of storage space occupied by the index files.
-
Citations
23 Claims
-
1. A method, comprising:
-
receiving raw machine data at an indexer, the raw machine data generated by at least one component in an information technology environment and reflecting activity in the information technology environment; processing, by the indexer, the raw machine data to generate one or more sets of events and an index file associated with at least one set of events of the one or more sets of events, the index file including a keyword portion associating a plurality of keywords with location references to events of the at least one set of events, wherein a keyword of the plurality of keywords is associated with at least one particular location reference to an event of the at least one set of events that includes the keyword; storing, by the indexer, the at least one set of events and the index file in one or more data stores; and based at least in part on one or more attributes of the at least one set of events satisfying one or more index size optimization conditions, removing at least a part of the keyword portion from the index file to reduce an amount of storage space used by the index file in the one or more data stores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. Non-transitory, computer readable storage media, storing computer-executable instructions, which, when executed by one or more processors, cause the one or more processors to:
-
receive raw machine data, the raw machine data generated by at least one component in an information technology environment and reflecting activity in the information technology environment; process the raw machine data to generate one or more sets of events and an index file associated with at least one set of events of the one or more sets of events, the index file including a keyword portion associating a plurality of keywords with location references to events of the at least one set of events, wherein a keyword of the plurality of keywords is associated with at least one particular location reference to an event of the at least one set of events that includes the keyword; store the at least one set of events and the index file in one or more data stores; and based at least in part on a determination that one or more attributes of the at least one set of events satisfies one or more index size optimization conditions, remove at least a part of the keyword portion from the index file to reduce an amount of storage space used by the index file in the one or more data stores.
-
-
23. A computing system, comprising:
-
one or more processing devices coupled to memory and configured to; receive raw machine data, the raw machine data generated by at least one component in an information technology environment and reflecting activity in the information technology environment; process the raw machine data to generate one or more sets of events and an index file associated with at least one set of events of the one or more sets of events, the index file including a keyword portion associating a plurality of keywords with location references to events of the at least one set of events, wherein a keyword of the plurality of keywords is associated with at least one particular location reference to an event of the at least one set of events that includes the keyword; store the at least one set of events and the index file in one or more data stores; and based at least in part on a determination that one or more attributes of the at least one set of events satisfies one or more index size optimization conditions, remove at least a part of the keyword portion from the index file to reduce an amount of storage space used by the index file in the one or more data stores.
-
Specification