Sampling events for rule creation with process selection

US 9,582,557 B2
Filed: 04/29/2015
Issued: 02/28/2017
Est. Priority Date: 01/22/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data;

receiving, from a user, a selection of one or more processes for identifying which events to include in a set;

wherein the one or more processes selected by the user include at least one of the following;

a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process;

for each selected process, identifying events for inclusion in the set using the process;

causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and

wherein identifying events for inclusion in the set includes using a process to identify diverse events, and wherein the process to identify diverse events includes;

performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and

selecting events from one or more most populous clusters in the plurality of clusters.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.

Citations

29 Claims

1. A computer-implemented method, comprising:
- accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data;
  
  receiving, from a user, a selection of one or more processes for identifying which events to include in a set;
  
  wherein the one or more processes selected by the user include at least one of the following;
  
  a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process;
  
  for each selected process, identifying events for inclusion in the set using the process;
  
  causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and
  
  wherein identifying events for inclusion in the set includes using a process to identify diverse events, and wherein the process to identify diverse events includes;
  
  performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and
  
  selecting events from one or more most populous clusters in the plurality of clusters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 24, 28)
- - 2. The method of claim 1, wherein identifying events for inclusion in the set includes using a process to identify outlier events.
  - 3. The method of claim 1, wherein identifying events for inclusion in the set includes using a process to identify events associated with earliest events in the plurality of events.
  - 4. The method of claim 1, wherein identifying events for inclusion in the set includes using a process to identify events associated with latest events in the plurality of events.
  - 5. The method of claim 1, wherein identifying events for inclusion in the set includes using a process randomly to identify events in the plurality of events.
  - 6. The method of claim 1, wherein identifying events for inclusion in the set includes using a combination of two or more processes including the process to identify diverse events and one or more processes selected from a process to identify outlier events, a process to earliest events in the plurality of events, a process to identify latest events in the plurality of events, and a process randomly to identify events in the plurality of events.
  - 7. The method of claim 1, wherein receiving from the user the selection of one or more processes for identifying which events to include in the set comprises causing display of a graphical interface that provides a plurality of identifiers for processes that can be selected, and wherein the user'"'"'s selection of one or more of the identifiers indicates the user'"'"'s selection of the one or more processes.
  - 8. The method of claim 1, wherein each event in the plurality of events is associated with a time stamp.
  - 9. The method of claim 1, wherein each event in the plurality of events is associated with a time stamp that has been extracted from the portion of raw machine data in that event.
  - 24. A non-transitory, computer-readable medium having computer executable instructions for performing the method of claim 1.
  - 28. A computer system with one or more processors adapted to perform the method of claim 1.

10. A computer-implemented method, comprising:
- accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data;
  
  receiving, from a user, a selection of one or more processes for identifying which events to include in a set;
  
  wherein the one or more processes selected by the user include at least one of the following;
  
  a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process and a latest event-identification process;
  
  for each selected process, identifying events for inclusion in the set using the process;
  
  causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and
  
  wherein identifying events for inclusion in the set includes using a process to identify diverse events, and wherein the process to identify diverse events includes;
  
  performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and
  
  selecting events from one or more most populous clusters in the plurality of clusters.
- View Dependent Claims (11, 12, 13, 25, 29)
- - 11. The method of claim 10, wherein receiving from the user the selection of one or more processes for identifying which events to include in the set comprises causing display of a graphical interface that provides a plurality of identifiers for processes that can be selected, and wherein the user'"'"'s selection of one or more of the identifiers indicates the user'"'"'s selection of the one or more processes.
  - 12. The method of claim 10, wherein each event in the plurality of events is associated with a time stamp.
  - 13. The method of claim 10, wherein each event in the plurality of events is associated with a time stamp that has been extracted from the portion of raw machine data in that event.
  - 25. A non-transitory, computer-readable medium having computer executable instructions for performing the method of claim 10.
  - 29. A computer system with one or more processors adapted to perform the method of claim 10.

14. A computer-implemented method, comprising:
- accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data;
  
  receiving, from a user, a selection of one or more processes for identifying which events to include in a set;
  
  wherein the one or more processes selected by the user include at least one of the following;
  
  a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process;
  
  for each selected process, identifying events for inclusion in the set using the process;
  
  causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and
  
  wherein identifying events for inclusion in the set includes using a process to identify diverse events or using a process to identify outlier events, and wherein the process includes;
  
  clustering a group of events in the plurality of events to form a plurality of clusters;
  
  determining that a number of clusters in the plurality of clusters is not big enough; and
  
  clustering a larger group of events in the plurality of events than the group of events.
- View Dependent Claims (15, 16, 17, 22, 26)
- - 15. The method of claim 14, wherein receiving from the user the selection of one or more processes for identifying which events to include in the set comprises causing display of a graphical interface that provides a plurality of identifiers for processes that can be selected, and wherein the user'"'"'s selection of one or more of the identifiers indicates the user'"'"'s selection of the one or more processes.
  - 16. The method of claim 14, wherein each event in the plurality of events is associated with a time stamp.
  - 17. The method of claim 14, wherein each event in the plurality of events is associated with a time stamp that has been extracted from the portion of raw machine data in that event.
  - 22. A non-transitory, computer-readable medium having computer executable instructions for performing the method of claim 14.
  - 26. A computer system with one or more processors adapted to perform the method of claim 14.

18. A computer-implemented method, comprising:
- accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data;
  
  receiving, from a user, a selection of one or more processes for identifying which events to include in a set;
  
  wherein the one or more processes selected by the user include at least one of the following;
  
  a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process;
  
  for each selected process, identifying events for inclusion in the set using the process;
  
  causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and
  
  wherein identifying events for inclusion in the set includes using a process to identify diverse events or using a process to identify outlier events, and wherein the process includes;
  
  clustering a group of events in the plurality of events to form a plurality of clusters;
  
  determining that a number of events in one of the clusters in the plurality of clusters is not big enough; and
  
  clustering a larger group of events in the plurality of events than the group of events.
- View Dependent Claims (19, 20, 21, 23, 27)
- - 19. The method of claim 18, wherein receiving from the user the selection of one or more processes for identifying which events to include in the set comprises causing display of a graphical interface that provides a plurality of identifiers for processes that can be selected, and wherein the user'"'"'s selection of one or more of the identifiers indicates the user'"'"'s selection of the one or more processes.
  - 20. The method of claim 18, wherein each event in the plurality of events is associated with a time stamp.
  - 21. The method of claim 18, wherein each event in the plurality of events is associated with a time stamp that has been extracted from the portion of raw machine data in that event.
  - 23. A non-transitory, computer-readable medium having computer executable instructions for performing the method of claim 18.
  - 27. A computer system with one or more processors adapted to perform the method of claim 18.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Splunk Inc. (Cisco Systems, Inc.)
Original Assignee
Splunk Inc. (Cisco Systems, Inc.)
Inventors
Carasso, R. David, Delfino, Micah James
Primary Examiner(s)
Ly, Anh

Application Number

US14/700,006
Publication Number

US 20150234905A1
Time in Patent Office

671 Days
Field of Search

707/779, 707/741, 707/748, 707/754, 707/770, 707/802, 707/610, 707/716, 707/737, 707/736, 707/711, 707/723, 707/722, 707/706, 707/756, 707/769, 707/602, 707/E17.002, 707/E17.014, 707/E17.032, 707/E17.044, 707/E17.108
US Class Current

1/1
CPC Class Codes

G06F 16/254   Extract, transform and load...

G06F 16/287   Visualization; Browsing

G06F 16/35   Clustering; Classification

G06F 16/904   Browsing; Visualisation the...

G06F 3/0482   Interaction with lists of s...

G06F 3/04842   Selection of displayed obje...

G06F 3/0488   using a touch-screen or dig...

G06F 7/24   Sorting, i.e. extracting da...

Sampling events for rule creation with process selection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Sampling events for rule creation with process selection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links