SYSTEM AND METHOD FOR INVESTIGATING LARGE AMOUNTS OF DATA
First Claim
1. A method comprising:
- using a computer that is configured with an improved search mechanism, receiving a search parameter;
using the computer and the improved search mechanism, deriving a search criterion from the search parameter and using the search criterion to obtain one or more first values from a first-key value family of a key-value data repository stored in a data storage device that is coupled to the computer;
using the computer and the improved search mechanism, using the one or more first values to obtain one or more compressed values from a second key-value family of the key-value data repository;
using the computer and the improved search mechanism, uncompressing the one or more compressed values to produce one or more uncompressed values;
using the computer and the improved search mechanism, using the one or more first values to identify one or more portions of the one or more uncompressed values;
using the computer and the improved search mechanism, returning the one or more portions of the one or more uncompressed values as search results.
8 Assignments
0 Petitions
Accused Products
Abstract
A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being stored to minimize storage requirements. The results of searches present input data in its original form. The input data may include access logs, call data records (CDRs), e-mail messages, etc. The system allows a data analyst to efficiently identify information of interest in a very large dynamic data set up to multiple petabytes in size. Once information of interest has been identified, that subset of the large data set can be imported into a dedicated or specialized data analysis system for an additional in-depth investigation and contextual analysis.
-
Citations
31 Claims
-
1. A method comprising:
-
using a computer that is configured with an improved search mechanism, receiving a search parameter; using the computer and the improved search mechanism, deriving a search criterion from the search parameter and using the search criterion to obtain one or more first values from a first-key value family of a key-value data repository stored in a data storage device that is coupled to the computer; using the computer and the improved search mechanism, using the one or more first values to obtain one or more compressed values from a second key-value family of the key-value data repository; using the computer and the improved search mechanism, uncompressing the one or more compressed values to produce one or more uncompressed values; using the computer and the improved search mechanism, using the one or more first values to identify one or more portions of the one or more uncompressed values; using the computer and the improved search mechanism, returning the one or more portions of the one or more uncompressed values as search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
using a computer that is configured with an improved search mechanism, receiving a search parameter and determining a first search criterion and one or more second search criteria based upon the search parameter; using the computer and the improved search mechanism, using the first search criterion and the second search criteria to obtain one or more first values from a first-key value family of a key-value data repository of a data storage device that is coupled to the computer; using the computer and the improved search mechanism, using the one or more first values to obtain one or more second values from a second key-value family of the key-value data repository; using the computer and the improved search mechanism, using the one or more second values to obtain one or more compressed values from a third key-value family of the key-value data repository; using the computer and the improved search mechanism, returning one or more uncompressed portions of the one or more compressed values as search results. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A data processing system comprising:
-
a computer comprising one or more processors; a key-value data repository in a data storage device that is coupled to the computer and comprising a first key-value family and a second key-value family; an improved search mechanism in the computer and configured to; receive a search parameter and determine a search criterion based upon the search parameter; use the search criterion to obtain one or more first values from the first-key value family; use the one or more first values to obtain one or more compressed values from the second key-value family; uncompress the one or more compressed values to produce one or more uncompressed values; use the one or more first values to identify one or more portions of the one or more uncompressed values; return the one or more portions of the one or more uncompressed values as search results. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A data system comprising:
-
a computer comprising one or more processors; a key-value data repository in a data storage device that is coupled to the computer and comprising a first key-value family, a second key-value family, and a third key-value family; an improved search mechanism in the computer and configured to; receive a search parameter and determine a first search criterion and one or more second search criteria based upon the search parameter; use the first search criterion and the second search criteria to obtain one or more first values from the first-key value family; use the one or more first values to obtain one or more second values from the second key-value family; use the one or more second values to obtain one or more compressed values from the third key-value family; return one or more uncompressed portions of the one or more compressed values as search results. - View Dependent Claims (26, 27, 28, 29, 30, 31)
-
Specification