×

Data extraction using templates

  • US 9,323,731 B1
  • Filed: 10/15/2013
  • Issued: 04/26/2016
  • Est. Priority Date: 11/01/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented data analysis method, comprising:

  • assigning one or more labels to one or more nodes in object models of respective web pages to provide a plurality of annotated object models;

    comparing the plurality of annotated object models;

    based on comparing the plurality of annotated object models, forming a plurality of composite object models, the forming including;

    for each composite object model of the plurality of object models, determining that two or more of the plurality of annotated object models have at least a specified level of similarity, and in response, storing data from the respective web pages in a single database to form the composite object model, the composite object model based on the two or more annotated object models and reflecting a structure of the web pages as a group;

    comparing an object model of a web page to each of the plurality of composite object models;

    based on comparing the object model of the web page to each of the plurality of composite object models, identifying a particular composite object model of the plurality of composite object models based on an edit distance between each of the plurality of composite object models and the object model of the web page;

    mapping the object model of the web page to the particular composite object model based on a minimum edit distance between the object model of the web page and the particular composite object model;

    extracting, from the web page, data associated with nodes in the object model of the web page that correspond to labeled nodes in the particular composite object model based on the mapping; and

    providing the extracted data i) for storage in a structured database in a manner associated with the labels and ii) for display by an application executable by a computing device associated with the web page.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×