System and method for indexing streams containing unstructured text data
First Claim
1. A method for indexing data, comprising the steps of:
- receiving data-streams, wherein the data-streams comprise data-elements;
storing the data-elements of the received data-streams, wherein the stored data-elements are stored via one or more processors in block-stores;
allocating the stored data-elements to data-blocks of the block-stores, wherein the stored data-elements are allocated via the one or more processors to the data-blocks;
further allocating the block-allocated data-elements to events of the data-blocks, wherein each of the data-blocks comprise one or more events, wherein each of the events comprise the block-allocated data-elements of the corresponding data-block, wherein the block-allocated data-elements are allocated via the one or more processors to the events;
splitting the event-allocated data-elements into terms, wherein the event-allocated data-elements are split via the one or more processors into the terms;
calculating a term frequencies of each term in each of the events, wherein the term frequencies are calculated via the one or more processors;
calculating block-level term frequency data for the event-allocated data-elements stored in the corresponding data-block based on the term frequencies, wherein the block-level term frequency data is calculated via the one or more processors; and
,generating tree index structures for the event-allocated data-elements based on the block-level term frequency data, wherein the tree index structures comprise Y-tree index structures, wherein the terms are used in the Y-tree index structures as keys, wherein the tree index structures are calculated via the one or more processors.
8 Assignments
0 Petitions
Accused Products
Abstract
A system, method and computer readable medium for indexing streaming data. Data may be received from distributed devices connected via a network. Data elements may be stored and allocated to data blocks and events of the block stores. Non-text data may be converted into a text representation. The data may be split into terms, and term frequencies of each term within each of the event may be calculated. Block-level term frequency statics may be calculated based on the term frequencies. Tree index structures, such as the Y-tree index, may be generated based on the block-level term frequency data. The Y-tree index structures may use the terms as keys and pointers to the corresponding data blocks and block-level term frequency data. A search query may be performed over the tree index structures.
48 Citations
29 Claims
-
1. A method for indexing data, comprising the steps of:
-
receiving data-streams, wherein the data-streams comprise data-elements; storing the data-elements of the received data-streams, wherein the stored data-elements are stored via one or more processors in block-stores; allocating the stored data-elements to data-blocks of the block-stores, wherein the stored data-elements are allocated via the one or more processors to the data-blocks; further allocating the block-allocated data-elements to events of the data-blocks, wherein each of the data-blocks comprise one or more events, wherein each of the events comprise the block-allocated data-elements of the corresponding data-block, wherein the block-allocated data-elements are allocated via the one or more processors to the events; splitting the event-allocated data-elements into terms, wherein the event-allocated data-elements are split via the one or more processors into the terms; calculating a term frequencies of each term in each of the events, wherein the term frequencies are calculated via the one or more processors; calculating block-level term frequency data for the event-allocated data-elements stored in the corresponding data-block based on the term frequencies, wherein the block-level term frequency data is calculated via the one or more processors; and
,generating tree index structures for the event-allocated data-elements based on the block-level term frequency data, wherein the tree index structures comprise Y-tree index structures, wherein the terms are used in the Y-tree index structures as keys, wherein the tree index structures are calculated via the one or more processors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A system for indexing data, comprising:
-
block-stores adapted to store data-elements of data-streams; data-blocks of the block-stores, the stored data-elements being allocated via one or more processors to the data-blocks; events of the data-blocks, the block-allocated data-elements being further allocated via the one or more processors to the events of the data-blocks, each of the data-blocks comprising one or more events, each of the events comprising the block-allocated data-elements of a corresponding data-block; terms generated via the one or more processors by splitting the event-allocated data-elements; term frequencies calculated via the one or more processors based on the frequency of each term in each of the events; block-level term frequency data calculated via the one or more processors for the event-allocated data-elements that are stored in a corresponding data-block, the block-level term frequency data being based on the term frequencies; and
,tree index structures generated via the one or more processors for the event-allocated data-elements based on the block-level term frequency data, the tree index structures comprising Y-tree index structures, the terms being used in the Y-tree index structures as keys. - View Dependent Claims (27, 28)
-
-
29. A non-transitory computer readable medium having computer readable instructions stored thereon for execution by a processor, wherein the instructions on the non-transitory computer readable medium are adapted to enable a computing device to:
-
receive data-streams, wherein the data-streams comprise data-elements; store the data-elements of the received data-streams, wherein the stored data-elements are stored in block-stores; allocate the stored data-elements to data-blocks of the block-stores; further allocate the block-allocated data-elements to events of the data-blocks, wherein each of the data-blocks comprise one or more events, wherein each of the events comprise the block-allocated data-elements of the corresponding data-block; split the event-allocated data-elements into terms; calculate term frequencies of each term in each of the events; calculate block-level term frequency data for the event-allocated data-elements stored in the corresponding data-block based on the term frequencies; and
,generate tree index structures for the event-allocated data-elements based on the block-level term frequency data, wherein the tree index structures comprise Y-tree index structures, wherein the terms are used in the Y-tree index structures as keys.
-
Specification