×

Efficient storage mechanism for representing term occurrence in unstructured text documents

  • US 20020165884A1
  • Filed: 05/04/2001
  • Published: 11/07/2002
  • Est. Priority Date: 05/04/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method of converting a document corpus containing an ordered plurality of documents into a compact representation in memory of occurrence data, said representation to be based on a dictionary previously developed for said document corpus and wherein each term in said dictionary has associated therewith a corresponding unique integer, said method comprising:

  • developing a first vector for said entire document corpus, said first vector being a listing of said unique integers corresponding to dictionary terms such that each said document in said document corpus is sequentially represented in said listing; and

    developing a second vector for said entire document corpus, said second vector indicating the location of each said document'"'"'s representation in said first vector.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×