Schema and ETL tools for structured and unstructured data
First Claim
1. A data capture schema comprising:
- a set of tables to store data extracted from a plurality of source documents having unstructured data; and
a table to store information about the extracted data, wherein the plurality of documents are assigned a unique key that identifies the document throughout a software system allowing (i) cross-analysis, (ii) linking of results for further analysis, (iii) drill-down from analytical reports back to the source documents or (iv) drill-down from analytical reports back to transformation information stored in the schema.
12 Assignments
0 Petitions
Accused Products
Abstract
A system and method of making unstructured data available to structured data analysis tools. The system includes middleware software that can be used in combination with structured data tools to perform analysis on both structured and unstructured data. Data can be read from a wide variety of unstructured sources. The data may then be transformed with commercial data transformation products that may, for example, extract individual pieces of data and determine relationships between the extracted data. The transformed data and relationships may then be passed through an extraction/transform/load (ETL) layer and placed in a structured schema. The structured schema may then be made available to commercial or proprietary structured data analysis tools.
-
Citations
18 Claims
-
1. A data capture schema comprising:
-
a set of tables to store data extracted from a plurality of source documents having unstructured data; and
a table to store information about the extracted data, wherein the plurality of documents are assigned a unique key that identifies the document throughout a software system allowing (i) cross-analysis, (ii) linking of results for further analysis, (iii) drill-down from analytical reports back to the source documents or (iv) drill-down from analytical reports back to transformation information stored in the schema. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A data analysis schema comprising:
-
a set of tables that provides structure to unstructured data, the table including master entities, the master entities comprising (i) a group of entities that appear in multiple documents that are the same actual entity, (ii) entities that are spelled differently that are the same actual entity, or (iii) entities that have multiple names that are the same actual entity; and
a table including relationships between entities.
-
-
14. The data analysis schema of 13, wherein the entities are grouped by hierarchy.
-
15. The data analysis schema of 13, wherein dates and numeric amounts are stored in specific columns in standard date and numeric formats.
-
16. The data analysis schema of 13, wherein the analysis schema can include structured data from structured data sources.
-
17. An extraction/transformation/load module comprising:
-
code to migrate data from a capture schema to an analysis schema, the code to migrate including code to map data and code to load data, wherein the capture schema comprises a set of tables to store data extracted from a plurality of source documents having unstructured data and a table to store information about the extracted data, wherein each of the plurality of documents are assigned a unique key that identifies the document throughout a software system allowing (i) cross-analysis, (ii) linking of results for further analysis, (iii) drill-down from analytical reports back to the source documents, or (iv) drill-down from analytical reports back to transformation information stored in the schema, wherein the analysis schema comprises a set of tables that provides structure to unstructured data, the table including master entities, the master entities comprising (i) a group of entities that appear in multiple documents that are the same actual entity, (ii) entities that are spelled differently that are the same actual entity, or (iii) entities that have multiple names that are the same actual entity and a table including relationships between entities. - View Dependent Claims (18)
-
Specification