Techniques for processing queries relating to task-completion times or cross-data-structure interactions

US 9,678,794 B1
Filed: 12/01/2016
Issued: 06/13/2017
Est. Priority Date: 12/02/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for using machine learning to identify anomaly subsets of iteration data, the method comprising:

accessing a structure including at least part of a definition for a workflow, the workflow including;

a first task of accessing a set of reads based on a material associated with a respective client;

a second task of aligning each read of the set of reads to a portion of a reference data set;

a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure;

a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and

a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state;

accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier;

using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and

generating a communication that represents the anomaly subset.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems disclosed herein relate generally to data processing by applying machine learning techniques to iteration data to identify anomaly subsets of iteration data. More specifically, iteration data for individual iterations of a workflow involving a set of tasks may contain a client data set, client-associated sparse indicators and their classifications, and a set of processing times for the set of tasks performed in that iteration of the workflow. These individual iterations of the workflow may also be associated with particular data sources. Using the iteration data, anomaly subsets within the iteration data can be identified, such as data items resulting from systematic error associated with particular data sources, sets of sparse indicators to be validated or double-checked, or tasks that are associated with long processing times. The anomaly subsets can be provided in a generated communication or report in order to optimize future iterations of the workflow.

139 Citations

20 Claims

1. A computer-implemented method for using machine learning to identify anomaly subsets of iteration data, the method comprising:
- accessing a structure including at least part of a definition for a workflow, the workflow including;
  
  a first task of accessing a set of reads based on a material associated with a respective client;
  
  a second task of aligning each read of the set of reads to a portion of a reference data set;
  
  a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure;
  
  a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and
  
  a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state;
  
  accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier;
  
  using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and
  
  generating a communication that represents the anomaly subset.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method for using machine learning to identify anomaly subsets of iteration data as recited in claim 1, wherein:
    - the result corresponding to the partial or full performance of the workflow includes, for each task of a plurality of tasks of the workflow, a processing-time variable that indicates when a performance of the task was completed or a duration of performance of the task;
      
      the anomaly subset of the iteration data identified using the machine-learning technique identifies a task of the plurality of tasks associated with long processing times relative to past processing times or normalized or unnormalized processing times of one or more other tasks of the plurality of tasks.
  - 3. The computer-implemented method for using machine learning to identify anomaly subsets of iteration data as recited in claim 1, wherein:
    - the result corresponding to the partial or full performance of the workflow identifies one or more sparse indicators associated the client, such that the iteration data identifies a plurality of sparse indicators;
      
      the anomaly subset of the iteration data identified using the machine-learning technique identifies a subset of the plurality of sparse indicators; and
      
      the communication facilitates selective confirmatory processing to be performed to determine whether data corresponding to the subset of the plurality of sparse indicators is validated.
  - 4. The computer-implemented method for using machine learning to identify anomaly subsets of iteration data as recited in claim 1, wherein:
    - the iteration data further includes, for each client of the plurality of clients, an origination identifier associated with a source of the set of reads and a timestamp;
      
      using the machine-learning technique to process the iteration data includes determining whether results corresponding to a first origination identifier are statistically different than results corresponding to one or more second origination identifiers or than results corresponding to a prior time period and the first origination identifier; and
      
      the communication identifies the source associated with the first origination identifier.
  - 5. The computer-implemented method for using machine learning to identify anomaly subsets of iteration data as recited in claim 1, wherein:
    - the iteration data further includes, for each client of a plurality of clients, one or more data-source variables that identify or characterize a source of the iteration data; and
      
      using the machine-learning technique includes updating or generating a model to identify data-source variables predictive of the results.
  - 6. The computer-implemented method for using machine learning to identify anomaly subsets of iteration data as recited in claim 1, wherein using the machine-learning technique comprises:
    - retrieving a parameter for a machine-learning model trained on another iteration data, the parameter reflecting a degree of variability observed across clients, iterations or alignment positions;
      
      determining whether, for each portion of multiple portions of the iteration data, an observed variability for the portion corresponds with the parameter; and
      
      for each portion of the multiple portions for which it is determined that the observed variability for the portion does not correspond with the parameter, identifying the portion as an anomaly subset.
  - 7. The computer-implemented method for using machine learning to identify anomaly subsets of iteration data as recited in claim 1, further comprising:
    - receiving, from a source, a request to perform an anomaly-detection assessment, wherein the iteration data is accessed and processed in response to receiving the request; and
      
      availing the communication to the source.

8. A system for using machine learning to identify anomaly subsets of iteration data, the system comprising:
- one or more data processors; and
  
  a non-transitory computer readable storage medium containing instructions which when executed on the one or more data processors, cause the one or more data processors to perform actions including;
  
  accessing a structure including at least part of a definition for a workflow, the workflow including;
  
  a first task of accessing a set of reads based on a material associated with a respective client;
  
  a second task of aligning each read of the set of reads to a portion of a reference data set;
  
  a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure;
  
  a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and
  
  a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state;
  
  accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier;
  
  using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and
  
  generating a communication that represents the anomaly subset.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system for using machine learning to identify anomaly subsets of iteration data as recited in claim 8, wherein:
    - the result corresponding to the partial or full performance of the workflow includes, for each task of a plurality of tasks of the workflow, a processing-time variable that indicates when a performance of the task was completed or a duration of performance of the task;
      
      the anomaly subset of the iteration data identified using the machine-learning technique identifies a task of the plurality of tasks associated with long processing times relative to past processing times or normalized or unnormalized processing times of one or more other tasks of the plurality of tasks.
  - 10. The system for using machine learning to identify anomaly subsets of iteration data as recited in claim 8, wherein:
    - the result corresponding to the partial or full performance of the workflow identifies one or more sparse indicators associated the client, such that the iteration data identifies a plurality of sparse indicators;
      
      the anomaly subset of the iteration data identified using the machine-learning technique identifies a subset of the plurality of sparse indicators; and
      
      the communication facilitates selective confirmatory processing to be performed to determine whether data corresponding to the subset of the plurality of sparse indicators is validated.
  - 11. The system for using machine learning to identify anomaly subsets of iteration data as recited in claim 8, wherein:
    - the iteration data further includes, for each client of the plurality of clients, an origination identifier associated with a source of the set of reads and a timestamp;
      
      using the machine-learning technique to process the iteration data includes determining whether results corresponding to a first origination identifier are statistically different than results corresponding to one or more second origination identifiers or than results corresponding to a prior time period and the first origination identifier; and
      
      the communication identifies the source associated with the first origination identifier.
  - 12. The system for using machine learning to identify anomaly subsets of iteration data as recited in claim 8, wherein:
    - the iteration data further includes, for each client of a plurality of clients, one or more data-source variables that identify or characterize a source of the iteration data; and
      
      using the machine-learning technique includes updating or generating a model to identify data-source variables predictive of the results.
  - 13. The system for using machine learning to identify anomaly subsets of iteration data as recited in claim 8, wherein using the machine-learning technique comprises:
    - retrieving a parameter for a machine-learning model trained on another iteration data, the parameter reflecting a degree of variability observed across clients, iterations or alignment positions;
      
      determining whether, for each portion of multiple portions of the iteration data, an observed variability for the portion corresponds with the parameter; and
      
      for each portion of the multiple portions for which it is determined that the observed variability for the portion does not correspond with the parameter, identifying the portion as an anomaly subset.
  - 14. The system for using machine learning to identify anomaly subsets of iteration data as recited in claim 8, wherein the instructions further cause the one or more data processors to perform actions including:
    - receiving, from a source, a request to perform an anomaly-detection assessment, wherein the iteration data is accessed and processed in response to receiving the request; and
      
      availing the communication to the source.

15. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform actions including:
- accessing a structure including at least part of a definition for a workflow, the workflow including;
  
  a first task of accessing a set of reads based on a material associated with a respective client;
  
  a second task of aligning each read of the set of reads to a portion of a reference data set;
  
  a third task of generating a client data set for the respective client using the aligned set of reads, the client data set including a set of values associated with each of one or more units, each unit of the one or more units corresponding to a set of defined positions within a data structure;
  
  a fourth task of detecting a presence of one or more sparse indicators associated with the respective client by comparing the set of values of the client data set to corresponding values in the reference data set, each sparse indicator of the one or more sparse indicators identifying a distinction between the client data set and the reference data set; and
  
  a fifth task of classifying each sparse indicator of the one or more sparse indicators into a category corresponding to a state transition likelihood variable associated with the sparse indicator representing a numeric likelihood, categorical likelihood or range of likelihoods of the sparse indicator causing a transition into a particular state;
  
  accessing iteration data for the workflow, the iteration data including, for each client of a plurality of clients, a result corresponding to a partial or full performance of the workflow and an iteration identifier;
  
  using a machine-learning technique to process the iteration data to identify an anomaly subset of the iteration data; and
  
  generating a communication that represents the anomaly subset.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-program product as recited in claim 15, wherein:
    - the result corresponding to the partial or full performance of the workflow includes, for each task of a plurality of tasks of the workflow, a processing-time variable that indicates when a performance of the task was completed or a duration of performance of the task;
      
      the anomaly subset of the iteration data identified using the machine-learning technique identifies a task of the plurality of tasks associated with long processing times relative to past processing times or normalized or unnormalized processing times of one or more other tasks of the plurality of tasks.
  - 17. The computer-program product as recited in claim 15, wherein:
    - the result corresponding to the partial or full performance of the workflow identifies one or more sparse indicators associated the client, such that the iteration data identifies a plurality of sparse indicators;
      
      the anomaly subset of the iteration data identified using the machine-learning technique identifies a subset of the plurality of sparse indicators; and
      
      the communication facilitates selective confirmatory processing to be performed to determine whether data corresponding to the subset of the plurality of sparse indicators is validated.
  - 18. The computer-program product as recited in claim 15, wherein:
    - the iteration data further includes, for each client of the plurality of clients, an origination identifier associated with a source of the set of reads and a timestamp;
      
      using the machine-learning technique to process the iteration data includes determining whether results corresponding to a first origination identifier are statistically different than results corresponding to one or more second origination identifiers or than results corresponding to a prior time period and the first origination identifier; and
      
      the communication identifies the source associated with the first origination identifier.
  - 19. The computer-program product as recited in claim 15, wherein:
    - the iteration data further includes, for each client of a plurality of clients, one or more data-source variables that identify or characterize a source of the iteration data; and
      
      using the machine-learning technique includes updating or generating a model to identify data-source variables predictive of the results.
  - 20. The computer-program product as recited in claim 15, wherein using the machine-learning technique comprises:
    - retrieving a parameter for a machine-learning model trained on another iteration data, the parameter reflecting a degree of variability observed across clients, iterations or alignment positions;
      
      determining whether, for each portion of multiple portions of the iteration data, an observed variability for the portion corresponds with the parameter; and
      
      for each portion of the multiple portions for which it is determined that the observed variability for the portion does not correspond with the parameter, identifying the portion as an anomaly subset.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Color Health Incorporated
Original Assignee
Color Genomics, Inc.
Inventors
Barrett, Ryan, Noguchi, Katsuya, Bhat, Nishant, Li, Zhengua, Smith, Kurt
Primary Examiner(s)
Kim, Sisley

Application Number

US15/366,409
Publication Number

US 20170161105A1
Time in Patent Office

194 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 11/3024   where the computing system ...

G06F 11/3419   by assessing time

G06F 9/4881   Scheduling strategies for d...

G06F 9/4887   involving deadlines, e.g. r...

G06N 20/00   Machine learning

G16H 50/20   for computer-aided diagnosi...

G16Z 99/00   Subject matter not provided...

Techniques for processing queries relating to task-completion times or cross-data-structure interactions

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

139 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Techniques for processing queries relating to task-completion times or cross-data-structure interactions

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

139 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others