×

Techniques for inducing high quality structural templates for electronic documents

  • US 8,046,681 B2
  • Filed: 11/27/2007
  • Issued: 10/25/2011
  • Est. Priority Date: 07/05/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • comparing, one document at a time, a structure of documents in a training set with a structure of an initial template;

    selecting at least one of the documents based on the comparing;

    generalizing the initial template to create a generalized template that has a structure that matches each of the selected documents;

    wherein generalizing the initial template to create the generalized template includes adding one or more operators to the initial template from a set of operators to create the generalized template, wherein the one or more operators includes a first operator that indicates that only one of a plurality of subtrees below the operator is allowed to occur at a position in the selected documents that corresponds to the position of the first operator in the generalized template.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×