×

EXTRACTING DATA CONTENT ITEMS USING TEMPLATE MATCHING

  • US 20090063500A1
  • Filed: 08/31/2007
  • Published: 03/05/2009
  • Est. Priority Date: 08/31/2007
  • Status: Active Grant
First Claim
Patent Images

1. One or more computer storage media having computer-executable instructions embodied thereon for performing a method for extracting data content items from web pages, the method comprising:

  • receiving a first web page having one or more data content items associated therewith;

    receiving an indication to label at least one of the data content items associated with the first web page;

    generating a Document Object Model (DOM) tree associated with the first web page, the DOM tree having a node associated with each data content item;

    labeling the node of the DOM tree associated with the at least one indicated data content item to generate a template DOM tree;

    comparing the template DOM tree with a DOM tree associated with a second web page to determine alignment there between; and

    if it is determined that a node of the DOM tree associated with the second web page aligns with the labeled node associated with the template DOM tree, extracting a data content item from the second web page that is associated with the aligned node of the DOM tree.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×