×

Generating and applying data extraction templates

  • US 9,563,689 B1
  • Filed: 08/27/2014
  • Issued: 02/07/2017
  • Est. Priority Date: 08/27/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for generating and applying data extraction templates to extract transient content from structured communications created automatically using templates, comprising:

  • grouping a corpus of structured communications into a plurality of clusters based on metadata associated with each structured communication;

    identifying, from structured communications of a particular cluster, a set of structural paths;

    classifying a first structural path of the set of structural paths, associated with a first segment of text, as transient in response to a determination that a count of occurrences of the first segment of text across the particular cluster satisfies a criterion;

    classifying a second structural path of the set of structural paths, associated with a second segment of text, as fixed in response to a determination that a frequency of occurrences of the second segment of text across the particular cluster does not satisfy the criterion;

    generating a data extraction template to extract, from one or more structured communications, one or more segments of text associated with the transient structural path;

    configuring the data extraction template so that one or more segments of text associated with the fixed structural path are ignored in one or more subsequent structured communications;

    associating a subsequent structured communication with the particular cluster based on metadata associated with the subsequent structured communication; and

    applying the data extraction template associated with the particular cluster to the subsequent structured communication to extract one or more segments of text associated with the transient structural path.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×