SYSTEM AND METHOD FOR INVESTIGATING LARGE AMOUNTS OF DATA
First Claim
1. A method comprising:
- obtaining a search criterion;
using the search criterion to obtain one or more first values from a first-key value family of a key-value data repository;
using the one or more first values to obtain one or more compressed values from a second key-value family of the key-value data repository;
uncompressing the one or more compressed values to produce one or more uncompressed values;
using the one or more first values to identify one or more portions of the one or more uncompressed values;
returning the one or more portions of the one or more uncompressed values as search results;
wherein the method is performed by one or more computing devices.
9 Assignments
0 Petitions
Accused Products
Abstract
A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being stored to minimize storage requirements. The results of searches present input data in its original form. The input data may include access logs, call data records (CDRs), e-mail messages, etc. The system allows a data analyst to efficiently identify information of interest in a very large dynamic data set up to multiple petabytes in size. Once information of interest has been identified, that subset of the large data set can be imported into a dedicated or specialized data analysis system for an additional in-depth investigation and contextual analysis.
-
Citations
40 Claims
-
1. A method comprising:
-
obtaining a search criterion; using the search criterion to obtain one or more first values from a first-key value family of a key-value data repository; using the one or more first values to obtain one or more compressed values from a second key-value family of the key-value data repository; uncompressing the one or more compressed values to produce one or more uncompressed values; using the one or more first values to identify one or more portions of the one or more uncompressed values; returning the one or more portions of the one or more uncompressed values as search results; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20, 21, 22, 23, 24, 39)
-
-
13. A method comprising:
-
obtaining a first search criterion and one or more second search criteria; using the first search criterion and the second search criteria to obtain one or more first values from a first-key value family of a key-value data repository; using the one or more first values to obtain one or more second values from a second key-value family of the key-value data repository; using the one or more second values to obtain one or more compressed values from a third key-value family of the key-value data repository; returning one or more uncompressed portions of the one or more compressed values as search results; wherein the method is performed by one or more computing devices. - View Dependent Claims (14, 15, 16, 17, 18, 19, 25, 40)
-
-
26. A system comprising:
-
a key-value data repository comprising a first key-value family and a second key-value family; a search mechanism comprising one or more processors and configured to; obtain a search criterion; use the search criterion to obtain one or more first values from the first-key value family; use the one or more first values to obtain one or more compressed values from the second key-value family; uncompress the one or more compressed values to produce one or more uncompressed values; use the one or more first values to identify one or more portions of the one or more uncompressed values; return the one or more portions of the one or more uncompressed values as search results. - View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. A system comprising:
-
a key-value data repository comprising a first key-value family, a second key-value family, and a third key-value family; a search mechanism comprising one or more processors and configured to; obtain a first search criterion and one or more second search criteria; use the first search criterion and the second search criteria to obtain one or more first values from the first-key value family; use the one or more first values to obtain one or more second values from the second key-value family; use the one or more second values to obtain one or more compressed values from the third key-value family; return one or more uncompressed portions of the one or more compressed values as search results. - View Dependent Claims (33, 34, 35, 36, 37, 38)
-
Specification