×

Document compression system and method for use with tokenspace repository

  • US 20070220023A1
  • Filed: 08/13/2004
  • Published: 09/20/2007
  • Est. Priority Date: 08/13/2004
  • Status: Active Grant
First Claim
Patent Images

1. A document compression method, comprising:

  • identifying a set of unique tokens contained in a set of documents, the set of documents comprising a sequence of tokens;

    assigning a unique first token identifier from a set of first token identifiers to each unique token based at least in part on the frequency of occurrence of the unique token in the set of documents, wherein high-frequency tokens are assigned smaller valued first token identifiers than low-frequency tokens;

    assigning a second token identifier from a set of second token identifiers to each token within a selected range of token positions in the set of documents, wherein each second token identifier corresponds to a first token identifier; and

    storing the second token identifiers in a repository for subsequent retrieval.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×