System and method of making unstructured data available to structured data analysis tools
First Claim
1. A method of making unstructured data available to structured data tools comprising:
- accessing a source of unstructured data;
extracting the unstructured data;
writing the extracted unstructured data to a capture schema;
sending the unstructured data to a transformation tool;
transforming the unstructured data with the transformation tool;
writing the transformed parsed unstructured data in a structured analysis schema;
providing data connectors that allow structured data tools to access the structured analysis schema.
12 Assignments
0 Petitions
Accused Products
Abstract
A system and method of making unstructured data available to structured data analysis tools. The system includes middleware software that can be used in combination with structured data tools to perform analysis on both structured and unstructured data. Data can be read from a wide variety of unstructured sources. The data may then be transformed with commercial data transformation products that may, for example, extract individual pieces of data and determine relationships between the extracted data. The transformed data and relationships may then be passed through an extraction/transform/load (ETL) layer and placed in a structured schema. The structured schema may then be made available to commercial or proprietary structured data analysis tools.
233 Citations
27 Claims
-
1. A method of making unstructured data available to structured data tools comprising:
-
accessing a source of unstructured data;
extracting the unstructured data;
writing the extracted unstructured data to a capture schema;
sending the unstructured data to a transformation tool;
transforming the unstructured data with the transformation tool;
writing the transformed parsed unstructured data in a structured analysis schema;
providing data connectors that allow structured data tools to access the structured analysis schema. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for making unstructured data available to structured data tools comprising:
-
code to access a source of unstructured data;
code to extract the unstructured data;
code to write the extracted unstructured data to a capture schema;
code to send the unstructured data to a transformation tool;
code to transform the unstructured data with the transformation tool;
code to write the transformed parsed unstructured data in a structured analysis schema;
code to provide data connectors that allow structured data tools to access the structured analysis schema. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An extraction service for extracting unstructured data from a plurality of unstructured data sources and a plurality of formats comprising:
-
a plurality of APIs to interface with the plurality of unstructured data sources; and
a single internal API that interfaces with a plurality of software components that allow structured data tools to operate on unstructured data. - View Dependent Claims (16)
-
-
17. A transformation connector comprising:
-
code capable of understanding the format of data provided by a transformation tool; and
code to convert the data provided by a transformation tool to a data format that maps to a data capture schema, the data capture schema comprising;
a table to store data extracted from a plurality of source documents having unstructured data; and
a table to store information about the extracted data, wherein the plurality of documents are assigned a unique key that identifies the document throughout a software system allowing (i) cross-analysis, (ii) linking of results for further analysis, (iii) drill-down from analytical reports back to the source documents or (iv) drill-down from analytical reports back to transformation information stored in the schema. - View Dependent Claims (18, 19)
-
-
20. A core server comprising code to allow parallel processing of unstructured data on a continuous real-time basis, wherein
the code is adapted to configure unstructured source extractors and treat them as black boxes in a data workflow; -
the code is adapted to extract unstructured text from a plurality of data sources and source systems, the extracted unstructured text available for input for further processing;
the code is adapted to configure end-to-end data flow from the plurality of data sources through one or more transformation components into a capture schema and into an analysis schema for analysis by structured data analysis tools;
the code is adapted to retain a single key for each data source, the key being associated with data generated by the transformation components; and
the code is adapted to store all extracted unstructured text, metadata and transformation data in a single schema. - View Dependent Claims (21)
-
-
22. A structured data connector that allows structured data analysis tools to analyze data in an analysis schema comprising:
-
ODBC code;
JDBC code; and
code to pre-populate metadata of the structured data analysis tools with tables, columns, attributes, data and metrics from an analysis schema without performing tool customization or application specific setup. - View Dependent Claims (23, 24, 25, 26, 27)
-
Specification