×

SPECIFIC ONLINE RESOURCE IDENTIFICATION AND EXTRACTION

  • US 20140188882A1
  • Filed: 12/31/2012
  • Published: 07/03/2014
  • Est. Priority Date: 12/31/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method of automatically identifying and extracting distributed online resources, the method comprising:

  • locating in a website a candidate entry list page;

    verifying the candidate entry list page as an entry list page using repeated pattern discovery;

    segmenting the entry list page into a plurality of entry items;

    extracting from the plurality of entry items a plurality of candidate target pages;

    verifying at least some of the candidate target pages as target pages including analyzing a visual structure and presentation of the candidate target pages;

    extracting metadata from the target pages; and

    organizing the target pages and/or the metadata in one or more databases.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×