Systems and method for automatically configuring machine learning models

US 10,296,848 B1
Filed: 03/05/2018
Issued: 05/21/2019
Est. Priority Date: 03/05/2018
Status: Active Grant

First Claim

Patent Images

1. A system that rapidly improves a classification accuracy of a machine learning classification model of an artificially intelligent conversational system, the system comprising:

a machine learning configuration and management console that enables an administrator of an artificially intelligent conversational service to configure updates to the machine learning classification model of the artificially intelligent conversational system, wherein the machine learning configuration and management console comprises one or more computer processors and a non-transitory computer-readable medium storing computer instructions then when executed by the one or more computer processors performs the steps of;

detecting that an accuracy level of the machine learning classification model does not satisfy a predetermined threshold;

in response to detecting that the accuracy level does not satisfy the predetermined threshold, automatically generating a notification requiring an update for improving a classification accuracy of the machine learning classification model;

implementing one or more user interfaces that receive input for configuring a machine learning training data request based on the notification, wherein a machine learning training data request includes a plurality of seed machine learning data samples and a request, from the artificially intelligent conversational service, to a plurality of remote third-party training data sources to generate machine learning training data using the plurality of seed machine learning data samples, wherein at least one of the plurality of remote third-party training data sources includes a remote crowdsourcing platform;

transmitting, by the artificially intelligent conversational service, via a network, the machine learning training data request to each of the plurality of remote third-party training data sources, wherein each of the plurality of remote third-party training data sources is different from each other;

collecting and storing the machine learning training data produced by each of the plurality of remote third-party training data sources, wherein the machine learning training data comprises a plurality of training data samples proliferated based on the seed machine learning data samples of the machine learning data request, and wherein each of the plurality of training data samples of the machine learning training data is distinct from each of the plurality of seed machine learning data samples;

processing the machine learning training data collected from the plurality of remote training data sources using a predefined training data processing algorithm, wherein the processing the machine learning training data includes;

[i] calculating a fit score value for each of the plurality of training data samples, wherein the fit score value relates to how well each of the plurality of training data samples fits one or more of the plurality of seed training data samples of the training data request,[ii] after the fit score value is calculated for each training data sample, applying a pruning threshold to each of the plurality of training data samples, wherein the pruning threshold comprises a minimum required fit score value for a given training data sample, and[iii] pruning from the plurality of training data samples any training data sample that does not satisfy the pruning threshold; and

in response to processing the collected machine learning training data;

[a] simulating a performance of the machine learning classification model using the plurality of training data samples remaining after the pruning;

[b] identifying a simulated accuracy level of the machine learning classification model;

updating the machine learning classification model based on the simulated accuracy level by training the machine learning classification model with the plurality of training data samples remaining after the pruning; and

after the updating, deploying the machine learning classification model into a live use by the artificially intelligent conversational system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for intelligently training a machine learning model includes: configuring a machine learning (ML) training data request for a pre-existing machine learning classification model; transmitting the machine learning training data request to each of a plurality of external training data sources, wherein each of the plurality of external training data sources is different; collecting and storing the machine learning training data from each of the plurality of external training data sources; processing the collected machine learning training data using a predefined training data processing algorithm; and in response to processing the collected machine learning training data, deploying a subset of the collected machine learning training data into a live machine learning model.

Citations

13 Claims

1. A system that rapidly improves a classification accuracy of a machine learning classification model of an artificially intelligent conversational system, the system comprising:
- a machine learning configuration and management console that enables an administrator of an artificially intelligent conversational service to configure updates to the machine learning classification model of the artificially intelligent conversational system, wherein the machine learning configuration and management console comprises one or more computer processors and a non-transitory computer-readable medium storing computer instructions then when executed by the one or more computer processors performs the steps of;
  
  detecting that an accuracy level of the machine learning classification model does not satisfy a predetermined threshold;
  
  in response to detecting that the accuracy level does not satisfy the predetermined threshold, automatically generating a notification requiring an update for improving a classification accuracy of the machine learning classification model;
  
  implementing one or more user interfaces that receive input for configuring a machine learning training data request based on the notification, wherein a machine learning training data request includes a plurality of seed machine learning data samples and a request, from the artificially intelligent conversational service, to a plurality of remote third-party training data sources to generate machine learning training data using the plurality of seed machine learning data samples, wherein at least one of the plurality of remote third-party training data sources includes a remote crowdsourcing platform;
  
  transmitting, by the artificially intelligent conversational service, via a network, the machine learning training data request to each of the plurality of remote third-party training data sources, wherein each of the plurality of remote third-party training data sources is different from each other;
  
  collecting and storing the machine learning training data produced by each of the plurality of remote third-party training data sources, wherein the machine learning training data comprises a plurality of training data samples proliferated based on the seed machine learning data samples of the machine learning data request, and wherein each of the plurality of training data samples of the machine learning training data is distinct from each of the plurality of seed machine learning data samples;
  
  processing the machine learning training data collected from the plurality of remote training data sources using a predefined training data processing algorithm, wherein the processing the machine learning training data includes;
  
  [i] calculating a fit score value for each of the plurality of training data samples, wherein the fit score value relates to how well each of the plurality of training data samples fits one or more of the plurality of seed training data samples of the training data request,[ii] after the fit score value is calculated for each training data sample, applying a pruning threshold to each of the plurality of training data samples, wherein the pruning threshold comprises a minimum required fit score value for a given training data sample, and[iii] pruning from the plurality of training data samples any training data sample that does not satisfy the pruning threshold; and
  
  in response to processing the collected machine learning training data;
  
  [a] simulating a performance of the machine learning classification model using the plurality of training data samples remaining after the pruning;
  
  [b] identifying a simulated accuracy level of the machine learning classification model;
  
  updating the machine learning classification model based on the simulated accuracy level by training the machine learning classification model with the plurality of training data samples remaining after the pruning; and
  
  after the updating, deploying the machine learning classification model into a live use by the artificially intelligent conversational system.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein:
    - the transmitting the machine learning training data request includes;
      
      identifying an input template for each of the plurality of external training data sources, wherein the input template comprises input fields for receiving parameters for generating the machine learning training data at each of the plurality of remote third-party training data sources, wherein the input template for each of the plurality of remote third-party training data sources is different;
      
      converting input data of the machine learning training data request to template input for the input template for each of the plurality of remote third-party training data sources; and
      
      feeding a respective input template having the converted input data of the machine learning training data request to a respective one of the plurality of remote third-party training data sources.
  - 3. The system of claim 1, wherein:
    - the plurality of training data samples comprises a plurality of labeled training data samples,the processing the collected machine learning training data includes;
      
      selectively pruning, by an administrator of the artificially intelligent conversational service, one or more of the plurality of training data samples from the plurality of labeled training data samples.
  - 4. The system of claim 1, whereincandidate training data samples of the plurality of training data samples that have been identified for pruning may be automatically pruned from the plurality of training data samples after an expiry of a predetermined time period.

5. A method implemented by an artificially intelligent conversational implement service that rapidly improves a classification accuracy of a machine learning classification model, the method comprising:
- detecting that an accuracy level of the machine learning classification model does not satisfy a predetermined threshold;
  
  in response to detecting that the accuracy level does not satisfy the predetermined threshold, automatically generating a notification requiring an update for improving a classification accuracy of the machine learning classification model;
  
  configuring a machine learning (ML) training data request based on the notification, wherein a machine learning training data request includes a plurality of seed machine learning data samples and a request, from the artificially intelligent conversational service, to a plurality of remote third-party training data sources to generate machine learning training data using the plurality of seed machine learning data samples, wherein at least one of the plurality of remote third-party training data sources includes a remote crowdsourcing platform;
  
  transmitting, by the artificially intelligent conversational implement service, via a network, the machine learning training data request to each of the plurality of remote third-party training data sources, wherein each of the plurality of remote third-party training data sources is different;
  
  collecting and storing the machine learning training data produced by each of the plurality of remote third-party training data sources, wherein the machine learning training data comprises a plurality of training data samples proliferated based on the seed machine learning data samples of the machine learning data request, and wherein each of the plurality of training data samples of the machine learning training data is distinct from each of the plurality of seed machine learning data samples;
  
  processing the machine learning training data collected from the plurality of remote training data sources using a predefined training data processing algorithm, wherein the processing the machine learning training data includes;
  
  [i] calculating a fit score value for each of the plurality of training data samples, wherein the fit score value relates to how well each of the plurality of training data samples fits one or more of the plurality of seed training data samples of the training data request,[ii] after the fit score value is calculated for each training data sample, applying a pruning threshold to each of the plurality of training data samples, wherein the pruning threshold comprises a minimum required fit score value for a given training data sample, and[iii] pruning from the plurality of training data samples any training data sample that does not satisfy the pruning threshold; and
  
  in response to processing the collected machine learning training data;
  
  [a] simulating a performance of the machine learning classification model using the plurality of training data samples remaining after the pruning;
  
  [b] identifying a simulated accuracy level of the machine learning classification model;
  
  updating the machine learning classification model based on the simulated accuracy level by training the machine learning classification model with the plurality of training data samples remaining after the pruning; and
  
  after the updating, deploying the machine learning classification model into a live use by the artificially intelligent conversational system.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
- - 6. The method of claim 5, wherein:
    - configuring the training data request includes;
      
      a selection of the machine learning classification model from a pool of pre-existing machine learning classification models of the artificially intelligent conversation system; and
      
      one or more of the plurality of seed machine learning data samples comprising one or more example user queries and/or one or more example user prompts.
  - 7. The method of claim 6, further comprising:
    - prior to the transmitting the machine learning training data request, identifying an input template for each of the plurality of remote third-party training data sources; and
      
      reformatting input of the machine learning training data request to input for the input template for each of the plurality of remote third-party training data sources.
  - 8. The method of claim 5, wherein:
    - configuring the machine learning training data request includes;
      
      an identification of a new machine learning classification task desired for the machine learning classification model; and
      
      generating one or more seed examples comprising one or more example user queries and/or one or more example user prompts for the new machine learning classification task.
  - 9. The method of claim 5, wherein:
    - the collecting the machine learning training data from each of the plurality of remote third-party training data sources is performed synchronously.
  - 10. The method of claim 5, wherein:
    - storing the machine learning training data includes storing a subset of the machine learning training data from each of the plurality of external training data sources into distinct datastore for each respective one of the plurality of remote third-party training data sources.
  - 11. The method of claim 5, wherein:
    - the machine learning classification model comprising a competency classification machine learning model,wherein the competency classification machine learning model is configured to generate a plurality of distinct competency classification labels,each of the plurality of distinct competency classification labels corresponds to one competency of a plurality of areas of competencies of an artificially intelligent virtual assistant, anda competency relates to a subject area of comprehension or aptitude of the artificially intelligent conversational system for which the artificially intelligent conversational system can interact with or provide a response to user input data.
  - 12. The method of claim 11, wherein:
    - the competency classification machine learning model comprises a single competency classification deep machine learning algorithm that is trained to detect each of the plurality of distinct competency classification labels, andgenerating the competency classification label for the user input data includes selecting the competency classification label having a highest probability of matching an intent of the user input data.
  - 13. The method of claim 11, wherein:
    - the competency classification machine learning model comprises an ensemble of competency classification deep machine learning algorithms, wherein each competency classification deep machine learning algorithm of the ensemble is trained to detect a distinct competency classification label of the plurality of distinct competency classification labels, andgenerating the competency classification label for the user input data includes selecting the competency classification label having a highest probability of matching an intent of the user input query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Clinc, Inc.
Original Assignee
Clinc, Inc.
Inventors
Mars, Jason, Tang, Lingjia, Laurenzano, Michael, Hauswald, Johann
Primary Examiner(s)
Waldron, Scott A.
Assistant Examiner(s)
Alabi, Oluwatosin O

Application Number

US15/911,491
Time in Patent Office

442 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/217   Validation; Performance eva...

G06F 18/24   Classification techniques

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G06N 20/20   Ensemble learning

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/047   Probabilistic or stochastic...

G06N 3/084   Backpropagation, e.g. using...

G06N 3/088   Non-supervised learning, e....

G06N 5/01   Dynamic search techniques; ...

G06N 5/025   Extracting rules from data

G06N 7/01   Probabilistic graphical mod...

Systems and method for automatically configuring machine learning models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and method for automatically configuring machine learning models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links