×

Method and system for compression indexing and efficient proximity search of text data

  • US 7,433,893 B2
  • Filed: 03/08/2004
  • Issued: 10/07/2008
  • Est. Priority Date: 03/08/2004
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method of compression indexing, comprising the steps of:

  • selecting at least one data file, said data file having target text;

    identifying each and every unique token, each of the unique tokens having a frequency;

    counting the frequency of each unique token;

    calculating parameters;

    ranking the tokens from highest frequency to lowest frequency;

    compressing the frequencies;

    assigning a position to each instance of each and every unique token;

    compressing the positions;

    aggregating tokens, frequencies, parameters, and positions to form a compression index, said compression index being exhaustive of all said tokens, such that said target text is compressed 100%;

    reconstituting a portion of the data file;

    displaying the portion of the data file on a screen, wherein compressed positions point to a compressed text random access memory file, wherein the step of reconstituting the data file further comprising the steps of;

    a) creating the compressed text RAM file;

    b) selecting a domain to display, the domain being a portion of the data file, the domain having a starting point and an ending point;

    c) decompressing successive integers;

    d) determining positions of the tokens in the token list;

    e) extracting the tokens from the token list; and

    f) writing the tokens to the screen;

    repeating steps c-f until the ending point of the domain is reached;

    wherein the selected domain is part of a domains list, the domains list having a plurality of domains, the domains list having the starting point and the ending point of each domain.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×