FRAMEWORK FOR DOCUMENT KNOWLEDGE EXTRACTION
First Claim
1. A computer-implemented method, comprising:
- defining an ontology for extracting structured knowledge from a plurality of web pages;
applying the ontology using a supervised extraction algorithm to obtain seed information from a set of web pages;
applying an unsupervised extraction algorithm to extract the structured knowledge from an additional set of web pages; and
mapping the structured knowledge to the ontology based at least on the seed information to produce an enriched ontology.
2 Assignments
0 Petitions
Accused Products
Abstract
A knowledge extraction framework may iteratively enrich an ontology that is used to classify structured knowledge obtained from web pages based on structured knowledge previously acquired from other web pages. The framework may enable a user to define the ontology for extracting structured knowledge from a plurality of web pages. The framework applies the ontology using a supervised extraction algorithm to extract seed information from a set of web pages. The framework further applies an unsupervised extraction algorithm to extract the structured knowledge from an additional set of web pages. The framework subsequently maps the structured knowledge to the ontology based on the seed information to enrich the ontology.
51 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
defining an ontology for extracting structured knowledge from a plurality of web pages; applying the ontology using a supervised extraction algorithm to obtain seed information from a set of web pages; applying an unsupervised extraction algorithm to extract the structured knowledge from an additional set of web pages; and mapping the structured knowledge to the ontology based at least on the seed information to produce an enriched ontology. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
-
defining an ontology for extracting structured knowledge from a plurality of web pages; applying the ontology using a supervised extraction algorithm to obtain seed entities from a set of web pages; applying an unsupervised extraction algorithm to obtain extracted entities from an additional set of web pages; determining a set of overlapping seed entities included in the seed entities that overlaps with the extracted entities; retrieving at least one attribute of each overlapping seed entity and each of the extracted entities, each attribute including an attribute name and an attribute value; and mapping attributes of the extracted entities to the ontology to produce an enriched ontology. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A computing device, comprising:
-
one or more processors; and a memory that includes a plurality of computer-executable components of a knowledge extraction framework, the plurality of computer-executable components comprising; a supervised learning module that applies a predefined ontology using a supervised extraction algorithm to extract seed information from a set of web pages; an unsupervised learning module that applies an unsupervised extraction algorithm to extract structured knowledge from an additional set of web pages; a mapping module that maps the structured knowledge to the ontology based at least on the seed information to enrich the ontology; and an annotation module that annotates the additional set of web pages based at least on the structured knowledge. - View Dependent Claims (19, 20)
-
Specification