Fast ingest, archive and retrieval systems, method and computer programs
First Claim
Patent Images
1. An apparatus comprising:
- a data generator for generating data to be archived;
a pre-archive filter for partitioning the data using predetermined queryable fields and for creating metadata relating to the data;
a short term storage device for storing the partitioned data in partitioned data files only until archiving of the data from the short term storage device to a long term storage is completed;
a database for storing the metadata in a metadata catalog, wherein the partitioned data files in conjunction with the metadata stored in the metadata catalog provides for persistent queryable files;
a query client for entering a query regarding the partitioned data files;
a query server for processing the query for querying against data stored in the short term storage device and the long term storage in the same way such that the data is always in a queryable state through the rest of the data'"'"'s lifecycle from the partitioning, the long term storage configured to store data archived from the short term storage device;
an archiver for processing the partitioned data files to store them in the long term storage and for updating the metadata catalog to reference the archived data files; and
an inference engine for correlating a first data and a second data from the data to generate a first relationship pair, correlating the second data and a third data from the data to generate a second relationship pair, and correlating the first relationship pair and the second relationship pair to generate a third relationship pair correlating the first data and the third data, the inference engine operating in parallel with the pre-archive filter,wherein in the correlating, the inference engine performs n degrees of correlation to correlate multiple dimensions of one-to-one or one-to-many inference metadata to produce inference relationships,wherein the data is queryable by the query server during the correlating, andwherein queryable includes user based searching using keywords.
7 Assignments
0 Petitions
Accused Products
Abstract
Systems, processing methods and computer programs that rapidly ingest, archive and dynamically query the data to retrieve it from short and long term storage devices are disclosed. Data is partitioned on queryable fields and metadata relating to the partitioned data is stored in a database. This allows for data to be stored in a persistent queryable state, providing query transparency irrespective of the location that the data is actually stored. Software code with differing functionality that shares consistent data structures and methods is used in components of the system to provide flexibility and speed.
33 Citations
20 Claims
-
1. An apparatus comprising:
-
a data generator for generating data to be archived; a pre-archive filter for partitioning the data using predetermined queryable fields and for creating metadata relating to the data; a short term storage device for storing the partitioned data in partitioned data files only until archiving of the data from the short term storage device to a long term storage is completed; a database for storing the metadata in a metadata catalog, wherein the partitioned data files in conjunction with the metadata stored in the metadata catalog provides for persistent queryable files; a query client for entering a query regarding the partitioned data files; a query server for processing the query for querying against data stored in the short term storage device and the long term storage in the same way such that the data is always in a queryable state through the rest of the data'"'"'s lifecycle from the partitioning, the long term storage configured to store data archived from the short term storage device; an archiver for processing the partitioned data files to store them in the long term storage and for updating the metadata catalog to reference the archived data files; and an inference engine for correlating a first data and a second data from the data to generate a first relationship pair, correlating the second data and a third data from the data to generate a second relationship pair, and correlating the first relationship pair and the second relationship pair to generate a third relationship pair correlating the first data and the third data, the inference engine operating in parallel with the pre-archive filter, wherein in the correlating, the inference engine performs n degrees of correlation to correlate multiple dimensions of one-to-one or one-to-many inference metadata to produce inference relationships, wherein the data is queryable by the query server during the correlating, and wherein queryable includes user based searching using keywords. - View Dependent Claims (2, 3, 4, 5, 19, 20)
-
-
6. A data processing method comprising:
-
ingesting data for archiving; partitioning by a pre-archive filter the ingested data using predetermined queryable fields; storing the partitioned data in a short term storage device only until archiving of the data from the short term storage device to a long term storage is completed; processing the ingested data to create metadata relating to the ingested data; storing the metadata in a metadata catalog of a database; processing a query for querying against data stored in the short term storage device and the long term storage in the same way such that the data is always in a queryable state through the rest of the data'"'"'s lifecycle from the partitioning; processing the partitioned data to store it in the long term storage; updating the metadata catalog to reference the archived data; and correlating by an inference engine a first data and a second data from the data to generate a first relationship pair, correlating the second data and a third data from the data to generate a second relationship pair, and correlating the first relationship pair and the second relationship pair to generate a third relationship pair correlating the first data and the third data, the inference engine operating in parallel with the pre-archive filter, wherein in the correlating, n degrees of correlation are used to correlate multiple dimensions of one-to-one or one-to-many inference metadata to produce inference relationships, wherein the data is queryable during the correlating, and wherein queryable includes user based searching using keywords. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium comprising:
-
a first code segment for ingesting data for archiving; a second code segment for partitioning the ingested data using predetermined queryable fields; a third code segment for storing the partitioned data in a short term storage device only until archiving of the data from the short term storage device to a long term storage is completed; a fourth code segment for processing the ingested data to create metadata relating to the ingested data; a fifth code segment for processing a query for querying against data stored in the short term storage device and the long term storage in the same way such that the data is always in a queryable state through the rest of the data'"'"'s lifecycle from the partitioning; a sixth code segment for storing data archived from the short term storage device to the long term storage; a seventh code segment for updating a metadata catalog to reference the archived data; and an eighth code segment for correlating a first data and a second data from the data to generate a first relationship pair, correlating the second data and a third data from the data to generate a second relationship pair, and correlating the first relationship pair and the second relationship pair to generate a third relationship pair correlating the first data and the third data, the eighth code segment operating in parallel with the second code segment, wherein in the correlating, n degrees of correlation are used to correlate multiple dimensions of one-to-one or one-to-many inference metadata to produce inference relationships, wherein the data is queryable during the correlating, and wherein queryable includes user based searching using keywords. - View Dependent Claims (15)
-
-
16. A non-transitory computer-readable medium for generating code, comprising:
-
a first code segment for processing data derived from an interface control document that defines formats for data types and code foundation data to automatically generate an extensible markup language (XML) file and generate source code having differing functionality that shares consistent data structures and methods using the XML file; a second code segment for partitioning the processed data using predetermined queryable fields; a third code segment for storing partitioned data in a short term storage device only until archiving of the data from the short term storage device to a long term storage is completed; a fourth code segment for processing a query for querying against data stored in the short term storage device and the long term storage in the same way such that the data is always in a queryable state through the rest of the data'"'"'s lifecycle from the partitioning; a fifth code segment for storing data archived from the short term storage device to the long term storage; and a sixth code segment for correlating a first data and a second data from the data to generate a first relationship pair, correlating the second data and a third data from the data to generate a second relationship pair, and correlating the first relationship pair and the second relationship pair to generate a third relationship pair correlating the first data and the third data, the sixth code segment operating in parallel with the second code segment, wherein in the correlating, n degrees of correlation are used to correlate multiple dimensions of one-to-one or one-to-many inference metadata to produce inference relationships, wherein the data is queryable during the correlating, and wherein queryable includes user based searching using keywords. - View Dependent Claims (17, 18)
-
Specification