×

Joint optimization of wrapper generation and template detection

  • US 7,660,804 B2
  • Filed: 08/16/2006
  • Issued: 02/09/2010
  • Est. Priority Date: 08/16/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method in a computing device with a processor and a memory for generating wrappers for hierarchically organized documents, each document having a document tree with nodes, the method comprising:

  • generating by the processor, for each of a plurality of clusters of documents, a wrapper by repeating the following until all the documents have been selected;

    selecting a document that has not yet been selected for creation of a wrapper tree having nodes;

    creating the wrapper tree for the document tree of the selected document;

    for each document whose distance from its document tree to the wrapper tree is within a threshold distance,selecting the document; and

    adjusting the wrapper tree based on the document tree of the selected document; and

    establishing the wrapper for the documents selected for creation and adjustment of the wrapper tree based on the adjusted wrapper treewherein a wrapper tree is created and adjusted for each cluster of documents whose document trees are within a threshold distance of the wrapper tree at the time of selection of the document, andwherein distance is represented by the following equation;

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×