Searchable archive
First Claim
1. A method of retrieving, by a data processing system, one or more data values stored in a searchable archive, the method comprising:
- selecting, by the data processing system, a compacted file from one or more compacted files associated with a data archive, the selected compacted file comprising a plurality of compressed segments of tokenized data and a metadata file including segment metadata for each of the plurality of compressed segments of tokenized data within the selected compacted file, wherein the tokenized data comprises one or more token values corresponding to one or more data values stored in the data archive;
accessing, by the data processing system, the metadata file within the selected compacted file;
selecting, by the data processing system, a compressed segment from the plurality of compressed segments in the selected compacted file based on the segment metadata stored in the metadata file;
generating, by the data processing system, a decompressed segment from the selected compressed segment without decompressing the entire compacted file;
searching, by the data processing system, the decompressed segment to determine if the decompressed segment includes one or more token values corresponding to the one or more data values being searched for;
compiling, by the data processing system, search results from each compacted file which match the token value being searched for; and
returning, by the data processing system, the search results.
11 Assignments
0 Petitions
Accused Products
Abstract
A searchable archiving system. A searchable archiving system includes one or more compacted files of archive data loosely coupled to a search process. To create a compacted file, an archiving process tokenizes the archive data, optimizes the tokenized archive data, and extracts archive metadata from the tokenized data. The tokenized data may then be compressed in a variety of ways into compressed segments that may be individually accessed and decompressed by the search agents. Before compression, segment metadata is extracted from the segments. The compressed segments and segment metadata are then combined to create a compacted file. The search process accesses the compacted files by consulting locally stored archive metadata extracted from the files during the compaction process. The search process then invokes one or more search agents that actively search the compacted files. The search agents do so by using the segment metadata to identify segments to decompress and search.
19 Citations
30 Claims
-
1. A method of retrieving, by a data processing system, one or more data values stored in a searchable archive, the method comprising:
-
selecting, by the data processing system, a compacted file from one or more compacted files associated with a data archive, the selected compacted file comprising a plurality of compressed segments of tokenized data and a metadata file including segment metadata for each of the plurality of compressed segments of tokenized data within the selected compacted file, wherein the tokenized data comprises one or more token values corresponding to one or more data values stored in the data archive; accessing, by the data processing system, the metadata file within the selected compacted file; selecting, by the data processing system, a compressed segment from the plurality of compressed segments in the selected compacted file based on the segment metadata stored in the metadata file; generating, by the data processing system, a decompressed segment from the selected compressed segment without decompressing the entire compacted file; searching, by the data processing system, the decompressed segment to determine if the decompressed segment includes one or more token values corresponding to the one or more data values being searched for; compiling, by the data processing system, search results from each compacted file which match the token value being searched for; and returning, by the data processing system, the search results. - View Dependent Claims (2, 9, 10, 11, 12, 13, 14)
-
-
3. A method of retrieving, by a data processing system, one or more data values stored in a searchable archive, the method comprising:
-
selecting, by the data processing system, a compacted file from one or more compacted files associated with a data archive, the selected compacted file including one or more compressed segments of tokenized data represented as bit vectors and a metadata file containing bit vector segment metadata, wherein the tokenized data comprises one or more token values corresponding to one or more data values stored in the data archive; accessing, by the data processing system, the metadata file within the selected compacted file; selecting, by the data processing system, one or more of the bit vectors corresponding to one or more data values being searched for from the selected compacted file based on the bit vector segment metadata stored in the metadata file; performing, by the data processing system, a Boolean operation on the bit vectors included in the selected compacted file to determine if the one or more token values corresponding to the one or more data values being searched for are contained within the selected compacted file; compiling search results from each compacted file which match the token value being searched for and returning the search results. - View Dependent Claims (4, 15, 16, 17, 18, 19)
-
-
5. A data processing system for retrieving one or more data values stored in a searchable archive, the system comprising:
-
a data store storing one or more compacted files; a processor coupled to the data store; a memory coupled to the processor, the memory having stored therein program instructions executable by the processor and which cause the processor to; select a compacted file from one or more compacted files associated with a data archive, the selected compacted file including comprising a plurality of compressed segments of tokenized data and a metadata file including segment metadata more for each of the plurality of compressed segments of tokenized data within the selected compacted file, wherein the tokenized data a comprises one or more token values corresponding to one or more data values stored in the data archive; enable access of the metadata file within the selected compacted file; select a compressed segment from the plurality of compressed segments in the selected compacted file based on the segment metadata stored in the metadata file; generate a decompressed segment from the selected compressed segment without decompressing the entire compacted file; search the decompressed segment to determine if the decompressed segment includes one or more token values corresponding to the one or more data values being searched for; compile search results from each compacted file which match the token value being searched for and return the search results. - View Dependent Claims (6, 20, 21, 22, 23, 24, 25)
-
-
7. A data processing system for retrieving one or more data values stored in a searchable archive, the system comprising:
-
a data store storing one or more compacted files; a processor coupled to the data store; and a memory coupled to the processor, the memory having stored therein program instructions executable by the processor and which cause the processor to; select a compacted file from one or more compacted files associated with a data archive, the selected compacted file including one or more compressed segments of tokenized data represented as bit vectors and a metadata file containing bit vector segment metadata, wherein the tokenized data comprises one or more token values corresponding to one or more data values stored in the data archive; enable access of the metadata file within the selected compacted file; select one or more of the bit vectors corresponding to one or more data values being searched for from the selected compacted file based on the bit vector segment metadata stored in the metadata file; perform a Boolean operation on the bit vectors included in the selected compacted file to determine if the one or more token values corresponding to the one or more data values being searched for are contained within the selected compacted file; compile search results from each compacted file which match the token value being searched for and return the search results. - View Dependent Claims (8, 26, 27, 28, 29, 30)
-
Specification