System and method for mining data
First Claim
1. A data mining method in a data processing system comprising a processor and a memory for storing data mined from a variety of sources into a structured target data model, the method comprising the steps of:
- (a) receiving a first source data for mining by the data processing system;
(b) parsing said first source data by the data processing system;
said parsing step further comprising;
(b1) parsing a source format of said first source data with an outer parser running in the processor;
(b2) based on said parsed source format, processing selected data extracted from said first source data with an inner level parser embedded in said outer parser; and
(b3) executing by said inner level parser one or more statements in an order dictated by a content of said first source data as dictated by an evolution of a parsing state of said outer level parser, wherein said one or more statements are expressed using an ontology description language having one or more data types and one or more data fields which are directly manipulated and assigned by said one or more statements executed within said inner level parser through said ontology description language without explicit declarations therein;
(c) creating, as a result of said parsing steps, a first collection of records conformed to the structured target data model as described by said ontology description language, wherein each of said records in said first collection of records are referenced and cross-referenced to each other;
(d) storing said first collection of records conformed to the structured target data model in the memory;
(e) retrieving said first collection of records for further processing by the data processing system.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method for extracting data, hereinafter referred to as MitoMine, that produces a strongly-typed ontology defined collection referencing (and cross referencing) all extracted records. The input to the mining process can be any data source, such as a text file delimited into a set of possibly dissimilar records. MitoMine contains parser routines and post processing functions, known as ‘munchers’. The parser routines can be accessed either via a batch mining process or as part of a running server process connected to a live source. Munchers can be registered on a per data-source basis in order to process the records produced, possibly writing them to an external database and/or a set of servers. The present invention also embeds an interpreted ontology based language within a compiler/interpreter (for the source format) such that the statements of the embedded language are executed as a result of the source compiler ‘recognizing’ a given construct within the source and extracting the corresponding source content. In this way, the execution of the statements in the embedded program will occur in a sequence that is dictated wholly by the source content. This system and method therefore make it possible to bulk extract free-form data from such sources as CD-ROMs, the web etc. and have the resultant structured data loaded into an ontology based system.
167 Citations
21 Claims
-
1. A data mining method in a data processing system comprising a processor and a memory for storing data mined from a variety of sources into a structured target data model, the method comprising the steps of:
-
(a) receiving a first source data for mining by the data processing system; (b) parsing said first source data by the data processing system;
said parsing step further comprising;(b1) parsing a source format of said first source data with an outer parser running in the processor; (b2) based on said parsed source format, processing selected data extracted from said first source data with an inner level parser embedded in said outer parser; and (b3) executing by said inner level parser one or more statements in an order dictated by a content of said first source data as dictated by an evolution of a parsing state of said outer level parser, wherein said one or more statements are expressed using an ontology description language having one or more data types and one or more data fields which are directly manipulated and assigned by said one or more statements executed within said inner level parser through said ontology description language without explicit declarations therein; (c) creating, as a result of said parsing steps, a first collection of records conformed to the structured target data model as described by said ontology description language, wherein each of said records in said first collection of records are referenced and cross-referenced to each other; (d) storing said first collection of records conformed to the structured target data model in the memory; (e) retrieving said first collection of records for further processing by the data processing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A data processing system for mining data from a variety of sources for storing in a structured target data model, the system comprising:
-
a processor; an outer parser running in said processor, wherein said outer parser parses a source format of a first source data received by said processor; an inner level parser embedded in said outer parser for parsing said first data source based on said parsed source format; an ontology description language in which the structured target data model is specified and which is also utilized by said inner level parser, wherein said inner level parser executes one or more statements in an order dictated by a content of said first source data, wherein said one or more statements are expressed in said ontology description language and access one or more data types and one or more data fields that are directly manipulated and assigned within said inner level parser using said ontology description language without explicit declarations therein; and a memory for storing a first collection of records created by said outer parser and said inner level parser, wherein said first collection of records conform to the structured target data model, and each of said records in said first collection of records are referenced and cross-referenced to each other, wherein said first collection of records are retrieved from the memory for further processing by the data processing system. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification