×

Template-based structured document classification and extraction

  • US 10,657,158 B2
  • Filed: 11/23/2016
  • Issued: 05/19/2020
  • Est. Priority Date: 11/23/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • identifying a data extraction template generated from a cluster of electronic messages that share at least some underlying structural and textual similarities;

    applying features of the cluster of electronic messages as input to one or more category machine learning models, wherein the one or more category machine learning models are trained to classify electronic messages into one or more of a plurality of document categories;

    determining a document category associated with the data extraction template based on output generated over the one or more category machine learning models based on the input provided to the one or more category machine learning models;

    applying the same features or different features of the cluster of electronic messages as input to one or more extraction machine learning models, wherein the one or more extraction machine learning models are trained to provide one or more locations of one or more transient fields in electronic messages, and wherein the one or more extraction machine learning models are selected from a plurality of extraction machine learning models based on the determined document category;

    determining one or more locations of one or more transient fields in the cluster of electronic messages based on output generated from the one or more extraction machine learning models based on the input provided to the one or more extraction machine learning models;

    storing, in computer memory, a first association between the data extraction template and the determined one or more transient field locations in the cluster of electronic messages;

    extracting at least two data points from a given electronic message of a user that shares at least some structural and textual similarities with the cluster of electronic messages, wherein the extracting is based on the first association; and

    providing the at least two extracted data points for surfacing to the user via one or more computing devices operated by the user.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×