Interactive System for Extracting Data from a Website
First Claim
1. In a computing environment, a method comprising, generating a wrapper for a record of a webpage based on a label assigned to that record, and adding the wrapper to a set of one or more existing wrappers.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology for efficiently labeling a webpage. A wrapper tool labels records of a webpage at the record level. If an existing wrapper exists that is appropriate for labeling a record, the wrapper tool automatically labels that record. For unlabeled records, the tool provides a user interface to label those records, and updates the set of existing wrappers with a new wrapper that is generated based upon the labeling operation; the new wrapper is then applied to any unlabeled records if appropriate for those records. As a result, a user typically needs only to label a relatively few records, with the wrappers generated for those records automatically used to label the other unlabeled records of the webpage.
-
Citations
20 Claims
- 1. In a computing environment, a method comprising, generating a wrapper for a record of a webpage based on a label assigned to that record, and adding the wrapper to a set of one or more existing wrappers.
- 9. In a computing environment, a system comprising, a record-level wrapper tool, including logic that uses existing wrappers to automatically assign a label to a record of a webpage when an existing wrapper corresponds to that record, and a user interface for interaction with a display of a webpage to label a record of the webpage, the logic configured to update the existing wrappers with a new wrapper that is generated based upon the user interaction to label the record.
-
17. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising:
-
(a) for each record of a webpage, determining whether any existing wrapper applies to that record, and if so, using an applicable wrapper to label that record; (b) detecting interaction with an element of the webpage to label a record corresponding to that element; (c) generating a new wrapper based upon the interaction to label the record; (d) applying the new wrapper to any unlabeled record of the webpage; and (e) determining whether the webpage contains any unlabeled record, and if so, returning to step (b). - View Dependent Claims (18, 19, 20)
-
Specification