Method and system for compression indexing and efficient proximity search of text data
First Claim
1. A method of compression indexing, comprising the steps of:
- selecting at least one data file;
identifying tokens, each of the tokens having a frequency;
counting the frequency of each token;
calculating parameters;
ranking the tokens from highest frequency to lowest frequency;
compressing the frequencies;
assigning a position to each instance of each token;
compressing the positions; and
, aggregating tokens, frequencies, parameters, and positions to form a compression index ebook.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method of compression indexing and efficient proximity search of text data permits high speed search featuring ranking the relevance of search results according to closeness of desired terms within each portion of text found. The system includes (a) preparing target text, (b) creating a “compression index ebook”, (c) browsing in a compression index ebook, and (d) searching in a compression index ebook. To create the compression index, the method includes the steps of selecting target text, identifying tokens, such as words and punctuation strings, wherein each of the tokens has a frequency. The frequencies of each token are counted. Tokens are ranked from highest frequency to lowest frequency. The frequencies are compressed. The next step is assigning positions to each token frequency and compressing the positions to form a compression index ebook, which is stored in random access memory to eliminate disk seeks during browsing and searching.
104 Citations
31 Claims
-
1. A method of compression indexing, comprising the steps of:
-
selecting at least one data file;
identifying tokens, each of the tokens having a frequency;
counting the frequency of each token;
calculating parameters;
ranking the tokens from highest frequency to lowest frequency;
compressing the frequencies;
assigning a position to each instance of each token;
compressing the positions; and
,aggregating tokens, frequencies, parameters, and positions to form a compression index ebook. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for using a compression index ebook, comprising the steps of:
-
a. creating the compression index ebook having the steps of;
(i) providing target text, the target text being at least one data file, the target text having tokens, the tokens having frequencies;
(ii) accumulating parameters;
(iii) building a list of all tokens represented in the target text, together with their respective frequencies;
(iv) sorting the list in order of declining token frequencies;
(v) accumulating positions data of each instance of each token; and
(vi) combining steps i-v into the compression index ebook;
b. browsing and searching the compression index ebook. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer readable medium containing instructions for controlling a computer system to perform a method, the method comprising the steps of:
-
selecting at least one file;
identifying tokens, each of the tokens having a frequency;
counting the frequency of each token;
calculating parameters;
ranking the tokens from highest frequency to lowest frequency;
compressing the frequencies;
assigning a position to each instance of each token;
compressing the positions;
aggregating tokens, frequencies, parameters, and positions to form a compression index ebook; and
browsing and searching the compression index ebook.
-
-
31. An apparatus, comprising:
-
means for selecting at least one file;
means for identifying tokens, each of the tokens having a frequency;
means for counting the frequency of each token;
means for calculating parameters;
means for ranking the tokens from highest frequency to lowest frequency;
means for compressing the frequencies;
means for assigning a position to each instance of each token;
means for compressing the positions; and
means for aggregating tokens, frequencies, parameters, and positions to form a compression index ebook; and
browsing and searching the compression index ebook.
-
Specification