×

Generating and applying data extraction templates

  • US 9,785,705 B1
  • Filed: 10/16/2014
  • Issued: 10/10/2017
  • Est. Priority Date: 10/16/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for generating and applying data extraction templates to extract transient content from plain text communications created automatically using templates, comprising:

  • grouping a corpus of plain text communications into a plurality of clusters based on one or more shared attributes;

    classifying one or more plain text segments of each plain text communication of a particular cluster as fixed in response to a determination that a count of occurrences of the one or more plain text segments across the particular cluster satisfies a criterion;

    classifying one or more remaining plain text segments of each plain text communication of the particular cluster as transient;

    generating a tree to represent sequences of classified plain text segments associated with each plain text communication of the particular cluster, wherein the tree includes at least a first branch to represent a first sequence of classified plain text segments corresponding to a first plain text communication of the particular cluster and a second branch to represent at least part of a second sequence of classified plain text segments corresponding to a second plain text communication of the particular cluster, wherein the second sequence of classified plain text segments is different than the first sequence of classified plain text segments;

    generating, based on the tree, a data extraction template to extract, from one or more subsequent plain text communications, content associated with transient segments;

    extracting content associated with at least one transient segment from a given subsequent plain text communication addressed to a user by applying the data extraction template to the given subsequent plain text communication; and

    rating the extracting performed on the given subsequent plain text communication based on how closely a sequence of classified plain text segments generated for the given subsequent plain text communication traverses a branch of the tree.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×