×

System and method for selecting data sample groups for machine learning of context of data fields for various document types and/or for test data generation for quality assurance systems

  • US 10,140,277 B2
  • Filed: 10/13/2016
  • Issued: 11/27/2018
  • Est. Priority Date: 07/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computing system implemented method for efficiently learning new forms in an electronic document preparation system, the method comprising:

  • receiving form data related to a new form having a plurality of data fields;

    gathering training set data related to previously filled forms, each previously filled form having one or more completed data fields that correspond to a respective data field of the new form;

    deleting from the training set data one or more sets of data of a previously filled form where a first set of data of the previously filled form matched a second set of data of the previously filled form and the deleted training set data includes the second set of data;

    generating, for a first selected data field, dependency data indicating one or more possible dependencies for an acceptable function, the possible dependencies including one or more data fields of the new form other than the first selected data field, the possible dependencies further including one or more constants of the first selected data field, the possible dependencies further including one or more values of data fields from a form other than the new form;

    generating, for a first selected data field of the plurality of data fields of the new form and based on the dependency data, candidate function data including a plurality of candidate functions;

    generating, for the first selected data field and based on the dependency data, grouping data by forming a plurality of groups from the training set data based on respective categories and assigning each of a plurality of the previously filled forms to a respective one of the groups based on the categories;

    generating, for the first selected data field, sampling data by selecting one or more previously filled forms from each group;

    generating, for each candidate function, test data by applying the candidate function to a portion of the training set data corresponding to the sampling data related to the candidate function;

    identifying one or more candidate functions of the plurality of candidate functions that have associated test data that are a best match to the training set data as compared with other candidate functions of the plurality of candidate functions;

    generating one or more additional candidate functions, the additional candidate functions being based on the identified one or more candidate functions that have associated test data that are a best match;

    repeatedly identifying generated candidate functions that have associated test data that are a best match to the training set data and generating one or more additional candidate functions, the additional candidate functions being based on the identified one or more candidate functions that have associated test data that are a best match until one or more candidate functions are determined to have associated test data that matches the training set data with a predetermined tolerance;

    identifying, from the plurality of candidate functions, an acceptable function for the first selected data field by comparing the test data to the training set data and identifying test data that matches the training set data within a predetermined tolerance, the identified acceptable function being a candidate function associated with the matching test data; and

    generating and outputting results data indicating the acceptable function for the first data field of the new form.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×