Refining extraction rules based on selected text within events
First Claim
1. A computer-implemented method comprising:
- receiving data indicating selection of a first event from among a plurality of events, wherein each event includes a portion of raw data and is associated with a time stamp;
receiving data indicating a selection of one or more portions of text within the first event to be extracted as one or more fields;
automatically determining at least one field extraction rule that extracts one or more values for the one or more fields from the text within the plurality of events when the extraction rule is applied to the plurality of events;
causing display of an annotated version of the plurality of events, wherein the annotated version indicates the portions of the text within the plurality of events extracted by the field extraction rule, the annotated version of the plurality of events including a second event to be used to refine field extraction; and
based on a selection of at least one portion of text within the second event to be extracted, updating the field extraction rule.
3 Assignments
0 Petitions
Accused Products
Abstract
The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not been processed by standard extraction or transformation methods. By using sample events, a focus on primary and secondary example events help formulate either a single extraction rule spanning multiple data formats, or multiple rules directed to distinct formats. Selection tools mark up the example events to indicate positive examples for the extraction rules, and to identify negative examples to avoid mistaken value selection. The extraction rules can be saved for query-time use, and can be incorporated into a data model for sets and subsets of event data.
-
Citations
30 Claims
-
1. A computer-implemented method comprising:
-
receiving data indicating selection of a first event from among a plurality of events, wherein each event includes a portion of raw data and is associated with a time stamp; receiving data indicating a selection of one or more portions of text within the first event to be extracted as one or more fields; automatically determining at least one field extraction rule that extracts one or more values for the one or more fields from the text within the plurality of events when the extraction rule is applied to the plurality of events; causing display of an annotated version of the plurality of events, wherein the annotated version indicates the portions of the text within the plurality of events extracted by the field extraction rule, the annotated version of the plurality of events including a second event to be used to refine field extraction; and based on a selection of at least one portion of text within the second event to be extracted, updating the field extraction rule. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented system comprising:
-
a processor, memory coupled to the processor, and instructions stored in the memory that implement the actions of; receiving data indicating selection of a first event from among a plurality of events, wherein each event includes a portion of raw data and is associated with a time stamp; receiving data indicating a selection of one or more portions of text within the first event to be extracted as one or more fields; automatically determining at least one field extraction rule that extracts one or more values for the one or more fields from the text within the plurality of events when the extraction rule is applied to the plurality of events; causing display of an annotated version of the plurality of events, wherein the annotated version indicates the portions of the text within the plurality of events extracted by the field extraction rule, the annotated version of the plurality of events including a second event to be used to refine field extraction; and based on a selection of at least one portion of text within the second event to be extracted, updating the field extraction rule. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. A tangible computer-readable memory having instructions stored in the memory that implement the actions including:
-
receiving data indicating selection of a first event from among a plurality of events, wherein each event includes a portion of raw data and is associated with a time stamp; receiving data indicating a selection of one or more portions of text within the first event to be extracted as one or more fields; automatically determining at least one field extraction rule that extracts one or more values for the one or more fields from the text within the plurality of events when the extraction rule is applied to the plurality of events; causing display of an annotated version of the plurality of events, wherein the annotated version indicates the portions of the text within the plurality of events extracted by the field extraction rule, the annotated version of the plurality of events including a second event to be used to refine field extraction; and based on a selection of at least one portion of text within the second event to be extracted, updating the field extraction rule. - View Dependent Claims (27, 28, 29, 30)
-
Specification