System and method for mining data
First Claim
1. A system for the extraction of data from a variety of sources into a single unifying ontology, comprising:
- a) an ontology based environment, such environment including an ontology description language (ODL) and a run-time accessible types system;
b) logically connected thereto, an extensible parsing environment, wherein such parsing environment supports customized reverse-polish plug-in operators;
c) logically connected thereto, a configurable outer parser capable of accepting a BNF (or equivalent) specification describing the source data format;
d) an embedded inner parser capable of executing statements and performing actions directly on the objects and types described by the system ontology.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method for extracting data, hereinafter referred to as MitoMine, that produces a strongly-typed ontology defined collection referencing (and cross referencing) all extracted records. The input to the mining process can be any data source, such as a text file delimited into a set of possibly dissimilar records. MitoMine contains parser routines and post processing functions, known as ‘munchers’. The parser routines can be accessed either via a batch mining process or as part of a running server process connected to a live source. Munchers can be registered on a per data-source basis in order to process the records produced, possibly writing them to an external database and/or a set of servers. The present invention also embeds an interpreted ontology based language within a compiler/interpreter (for the source format) such that the statements of the embedded language are executed as a result of the source compiler ‘recognizing’ a given construct within the source and extracting the corresponding source content. In this way, the execution of the statements in the embedded program will occur in a sequence that is dictated wholly by the source content. This system and method therefore make it possible to bulk extract free-form data from such sources as CD-ROMs, the web etc. and have the resultant structured data loaded into an ontology based system.
243 Citations
26 Claims
-
1. A system for the extraction of data from a variety of sources into a single unifying ontology, comprising:
-
a) an ontology based environment, such environment including an ontology description language (ODL) and a run-time accessible types system;
b) logically connected thereto, an extensible parsing environment, wherein such parsing environment supports customized reverse-polish plug-in operators;
c) logically connected thereto, a configurable outer parser capable of accepting a BNF (or equivalent) specification describing the source data format;
d) an embedded inner parser capable of executing statements and performing actions directly on the objects and types described by the system ontology. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for the extracting data from a variety of sources into a single unifying ontology, comprising the steps of:
-
a) receiving source data;
b) parsing the source format with an outer parser, wherein such outer parser includes an embedded parser for an interpreted ontology description language (ODL);
c) parsing the source data with the outer parser and embedded parser using the parsed source format;
d) passing statements in an embedded language to the embedded parser;
e) responsive to one or more actions by the outer parser, executing one or more statements in the embedded language. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification