×

System for providing database functions for multiple internet sources

  • US 6,826,553 B1
  • Filed: 11/16/2000
  • Issued: 11/30/2004
  • Est. Priority Date: 12/18/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A system for automatically extracting data from at least one electronic document in any of a plurality of formats, said at least one electronic document including a target page being accessible over a computer network, said target page comprising a plurality of elements each having a contents or structural definition, wherein said structural definition interrelates said plurality of elements, said system comprising:

  • a navigation module to record a sequence of actions associated with an initial visit by a user to said target page operable to navigate to said target page of said electronic document;

    an extraction recording module to receive user inputs from said user defining information of interest to said user to be extracted from said plurality of elements of said target page and generating a target pattern for automatically extracting said information of interest to said user from said target page;

    a navigation playback module to automatically access said target page according to said recorded sequence for at least one subsequent visit to said target page; and

    an extraction playback module to automatically identify and scrape select ones of said plurality of elements dependent upon said target pattern for each said at least one subsequent visit to said target page;

    said extraction recording module remapping said target page by re-identifying any modified structural definitions of said plurality of elements thereby to enable access to an altered target page;

    said extraction playback module identifies and scrapes said select ones of said plurality of elements dependent upon said target pattern and said re-identified structural definitions to thereby automatically identify and scrape said select ones of said plurality of elements from said altered target page dependent upon said target pattern;

    wherein information of interest to said user is automatically extracted from said target page for each said at least one subsequent visit to said target page.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×