Method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system
First Claim
1. In a computer-based information storage and retrieval system, a method for computer storing of bibliometric information on items in a finite source of text, using a postulated relationship between the bibliometric property of an item and its rank number in a linear ordering of all items in said source, said method comprising the steps of:
- a) computing and storing a linear ordering of all items including, for any item in said source, positioning it properly in the said linear ordering, each stored item in the linear ordering having a rank number,b) determining an item'"'"'s rank number in the linear ordering,c) for any item in said source, storing the item'"'"'s rank number as determined in step b) above as an indication of that item'"'"'s bibliometric property,d) using the item'"'"'s rank number for managing information about the source.
1 Assignment
0 Petitions
Accused Products
Abstract
A method to compress, store, and retrieve bibliometric information on multiple sources of text is presented. The compression consists of 2 parts, and may use any one of the many ordering-based bibliometric laws for sources of text. The first compression part comprises of the storage of bibliometric information on the items from a text source, using the rank of the items in the ordering relation as defined in the bibliometric law as an indication of the bibliometric information. The second compression part efficiently uses pointers and tables to get rid of redundant information. As an application, a posting compression method is presented for use in term weighting retrieval systems. The first compression uses a postulated rank-occurrence frequency relation for the document in question that has as only variable the document'"'"'s length, for example Zipf'"'"'s law that states that the product of rank and frequency is approximately constant. The second compression part efficiently uses pointers and a few tables next to the principal storage. The compression makes use of direct random addressability. All postings relating to a particular document may be stored together, allowing easy expendability and updating. With respect to conventional technology, storage requirements is roughly halved.
51 Citations
9 Claims
-
1. In a computer-based information storage and retrieval system, a method for computer storing of bibliometric information on items in a finite source of text, using a postulated relationship between the bibliometric property of an item and its rank number in a linear ordering of all items in said source, said method comprising the steps of:
-
a) computing and storing a linear ordering of all items including, for any item in said source, positioning it properly in the said linear ordering, each stored item in the linear ordering having a rank number, b) determining an item'"'"'s rank number in the linear ordering, c) for any item in said source, storing the item'"'"'s rank number as determined in step b) above as an indication of that item'"'"'s bibliometric property, d) using the item'"'"'s rank number for managing information about the source. - View Dependent Claims (2, 3)
-
-
4. A method for storing document postings in a computer-based full-text document retrieval system, said method using a postulated relationship between a lexical term'"'"'s occurrence frequency rank and its nominal weight, said postulated relationship depending exclusively on this rank and on the number of different lexical terms in a particular document as variables, said method comprising the steps of:
-
a) for any lexical term in a particular document, computing its occurrence frequency and therefrom said lexical term'"'"'s rank in an occurrence frequency ordering of the terms, b) storing a normalization factor as governed by said postulated relationship, c) assigning to each lexical term in the particular document a sequence number based on standard address mapping transform that is uniform among said documents and preserving said rank order, d) for each document, storing a sequence of postings each assigned to a unique lexical term in the document according to its assigned sequence number, each posting comprising; (i) an identifier for a next successive document comprising the same unique term, (ii) an offset value operating as pointer in said next successive document while indicating said unique term'"'"'s posting sequence number in said next successive document as well as a direct indication of that unique term'"'"'s weight in the document so identified. - View Dependent Claims (5, 6)
-
-
7. A method for computer-accessing document postings in a computer-based full text retrieval system, said method comprising the steps of:
-
a) entering for a document term a term identity representation, b) searching for an initial posting in a posting base table that has a document identifier and an offset, c) executing a program loop and while in each loop traversal, by means of a most recent document identifier and most recent offset, computing a next posting address, each posting so addressed containing a next document identifier and offset thereupon featuring as being most recent, d) and in each loop traversal also using its offset value found as a pointer and also as a weight representation number associated to said term in respect of its occurrence in the document associated to each respective loop traversal. - View Dependent Claims (8, 9)
-
Specification