×

Generating and applying data extraction templates

  • US 10,216,838 B1
  • Filed: 12/29/2016
  • Issued: 02/26/2019
  • Est. Priority Date: 08/27/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for generating and applying data extraction templates to extract transient content from structured communications created automatically using templates, comprising:

  • grouping a corpus of structured communications into a plurality of clusters based on one or more patterns shared among one or more structured communications within the corpus;

    identifying, from structured communications of a particular cluster, a set of structural paths;

    classifying a first structural path of the set of structural paths, associated with a first segment of text, as a first transient structural path in response to a determination that a count of occurrences of the first segment of text across the particular cluster satisfies a criterion;

    classifying the first transient structural path as a first semantic data type based on one or more signals related to the structured communications of the particular cluster;

    classifying a second structural path of the set of structural paths, associated with a second segment of text, as a second transient structural path in response to a determination that a count of occurrences of the second segment of text across the particular cluster satisfies the same criterion or a different criterion;

    classifying the second transient structural path as a second semantic data type based at least in part on the first semantic data type;

    generating a data extraction template to extract, from one or more subsequent structured communications, one or more segments of text associated with the first transient structural path;

    associating a subsequent structured communication with the particular cluster based on one or more patterns shared between the subsequent structured communication and one or more structured communications of the corpus; and

    applying the data extraction template associated with the particular cluster to the subsequent structured communication to extract one or more segments of text associated with the first transient structural path.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×