×

Generating document templates that are robust to structural variations

  • US 7,668,942 B2
  • Filed: 05/02/2008
  • Issued: 02/23/2010
  • Est. Priority Date: 05/02/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A network device configured to manage document templates, comprising:

  • a transceiver to send and receive data over a network; and

    a processor that is operative to enable actions for;

    receiving a tree-based regular expression that represents the template;

    below a given level in the tree-based regular expression, performing;

    forming clusters of sub-trees of the tree-based regular expression via a cost measure;

    generating a nested pattern regular expression based on the clusters;

    merging sub-trees based on the nested pattern regular expression;

    replacing sub-trees in the tree-based regular expression at the given level with the merged sub-trees; and

    repeating, for a next higher level of the tree-based regular expression that is closer to a root of the corresponding tree, the actions of forming clusters, generating a nested pattern regular expression, merging sub-trees, and replacing sub-trees in the tree-based regular expression.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×