Techniques for processing queries relating to task-completion times or cross-data-structure interactions
First Claim
1. A computer-implemented method for using machine learning to identify anomaly subsets of iteration data, the method comprising:
- accessing a structure including at least part of a definition for a workflow, the workflow including;
a first task of accessing a set of reads based on a material associated with a respective client;
a second task of aligning each read of the set of reads to a portion of a reference data set;
a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure;
a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and
a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state;
accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier;
using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and
generating a communication that represents the anomaly subset.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems disclosed herein relate generally to data processing by applying machine learning techniques to iteration data to identify anomaly subsets of iteration data. More specifically, iteration data for individual iterations of a workflow involving a set of tasks may contain a client data set, client-associated sparse indicators and their classifications, and a set of processing times for the set of tasks performed in that iteration of the workflow. These individual iterations of the workflow may also be associated with particular data sources. Using the iteration data, anomaly subsets within the iteration data can be identified, such as data items resulting from systematic error associated with particular data sources, sets of sparse indicators to be validated or double-checked, or tasks that are associated with long processing times. The anomaly subsets can be provided in a generated communication or report in order to optimize future iterations of the workflow.
139 Citations
20 Claims
-
1. A computer-implemented method for using machine learning to identify anomaly subsets of iteration data, the method comprising:
-
accessing a structure including at least part of a definition for a workflow, the workflow including; a first task of accessing a set of reads based on a material associated with a respective client; a second task of aligning each read of the set of reads to a portion of a reference data set; a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure; a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state; accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier; using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and generating a communication that represents the anomaly subset. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for using machine learning to identify anomaly subsets of iteration data, the system comprising:
-
one or more data processors; and a non-transitory computer readable storage medium containing instructions which when executed on the one or more data processors, cause the one or more data processors to perform actions including; accessing a structure including at least part of a definition for a workflow, the workflow including; a first task of accessing a set of reads based on a material associated with a respective client; a second task of aligning each read of the set of reads to a portion of a reference data set; a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure; a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state; accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier; using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and generating a communication that represents the anomaly subset. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform actions including:
-
accessing a structure including at least part of a definition for a workflow, the workflow including; a first task of accessing a set of reads based on a material associated with a respective client; a second task of aligning each read of the set of reads to a portion of a reference data set; a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure; a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state; accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier; using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and generating a communication that represents the anomaly subset. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification