×

METHOD AND SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM WEB SITES

  • US 20080114800A1
  • Filed: 01/15/2008
  • Published: 05/15/2008
  • Est. Priority Date: 07/15/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for automatically extracting and structuring the data from a semi-structured web site, the method comprising:

  • developing a set of experts;

    analyzing the links and pages on the website by means of the experts;

    identifying predetermined types of generic structures by means of the experts;

    clustering pages and text segments within the pages based on the identified structures; and

    identifying, based on the clustering, the semi-structured data that can be extracted.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×