×

Method for learning and combining global and local regularities for information extraction and classification

  • US 6,892,189 B2
  • Filed: 01/26/2001
  • Issued: 05/10/2005
  • Est. Priority Date: 01/26/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. In a data processing system, a method for creating a database from information found on a plurality of web pages, said information comprising global regularities and local regularities, said global regularities being patterns that are expected to be found in all said web pages, and said local regularities being patterns which are not expected to be found in all said web pages, said method comprising:

  • a) providing a global classifier using said global regularities;

    b) identifying a candidate subset of the web pages expected to have said local regularities;

    thereafter c) tentatively identifying and tagging, in said candidate subset of the web pages, elements having said global regularities, by using said global classifier to obtain first tentative labels;

    d) training a local classifier using said first tentative labels, said local classifier using said local regularities for its classification;

    e) tentatively identifying elements having specific combinations of said global regularities and said local regularities using said global classifier and said local classifier to obtain second tentative labels for said elements of said candidate subset; and

    thereafter f) outputting said second tentative labels as permanent labels associated with said elements of said candidate subset of web pages.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×