Method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system

US 5,293,552 A
Filed: 03/30/1992
Issued: 03/08/1994
Est. Priority Date: 04/08/1991
Status: Expired due to Term

First Claim

Patent Images

1. In a computer-based information storage and retrieval system, a method for computer storing of bibliometric information on items in a finite source of text, using a postulated relationship between the bibliometric property of an item and its rank number in a linear ordering of all items in said source, said method comprising the steps of:

a) computing and storing a linear ordering of all items including, for any item in said source, positioning it properly in the said linear ordering, each stored item in the linear ordering having a rank number,b) determining an item'"'"'s rank number in the linear ordering,c) for any item in said source, storing the item'"'"'s rank number as determined in step b) above as an indication of that item'"'"'s bibliometric property,d) using the item'"'"'s rank number for managing information about the source.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method to compress, store, and retrieve bibliometric information on multiple sources of text is presented. The compression consists of 2 parts, and may use any one of the many ordering-based bibliometric laws for sources of text. The first compression part comprises of the storage of bibliometric information on the items from a text source, using the rank of the items in the ordering relation as defined in the bibliometric law as an indication of the bibliometric information. The second compression part efficiently uses pointers and tables to get rid of redundant information. As an application, a posting compression method is presented for use in term weighting retrieval systems. The first compression uses a postulated rank-occurrence frequency relation for the document in question that has as only variable the document'"'"'s length, for example Zipf'"'"'s law that states that the product of rank and frequency is approximately constant. The second compression part efficiently uses pointers and a few tables next to the principal storage. The compression makes use of direct random addressability. All postings relating to a particular document may be stored together, allowing easy expendability and updating. With respect to conventional technology, storage requirements is roughly halved.

51 Citations

View as Search Results

9 Claims

1. In a computer-based information storage and retrieval system, a method for computer storing of bibliometric information on items in a finite source of text, using a postulated relationship between the bibliometric property of an item and its rank number in a linear ordering of all items in said source, said method comprising the steps of:
- a) computing and storing a linear ordering of all items including, for any item in said source, positioning it properly in the said linear ordering, each stored item in the linear ordering having a rank number,b) determining an item'"'"'s rank number in the linear ordering,c) for any item in said source, storing the item'"'"'s rank number as determined in step b) above as an indication of that item'"'"'s bibliometric property,d) using the item'"'"'s rank number for managing information about the source.
- View Dependent Claims (2, 3)
- - 2. A method as claimed in claim 1, further comprising:
    - e) assigning to each item in a particular source a sequence number based on address mapping transform that is uniform among the said sources and preserves rank order,f) for each source, storing a sequence of tuples, each of which tuples is assigned to a unique item in its source, and each tuple comprising;
      
      (i) an identifier for a next successive source comprising the same unique item,(ii) an offset value indicating both that unique item'"'"'s sequence number in the source so identified as well as that unique item'"'"'s rank number in said linear ordering as an indication of that unique item'"'"'s bibliometric property.
  - 3. The method of claim 2, wherein the sources are documents each having lexical terms, and further comprising the step of assigning to each lexical term a weight indicating the relevance of that term in the document, and storing the thus assigned weight.

4. A method for storing document postings in a computer-based full-text document retrieval system, said method using a postulated relationship between a lexical term'"'"'s occurrence frequency rank and its nominal weight, said postulated relationship depending exclusively on this rank and on the number of different lexical terms in a particular document as variables, said method comprising the steps of:
- a) for any lexical term in a particular document, computing its occurrence frequency and therefrom said lexical term'"'"'s rank in an occurrence frequency ordering of the terms,b) storing a normalization factor as governed by said postulated relationship,c) assigning to each lexical term in the particular document a sequence number based on standard address mapping transform that is uniform among said documents and preserving said rank order,d) for each document, storing a sequence of postings each assigned to a unique lexical term in the document according to its assigned sequence number, each posting comprising;
  
  (i) an identifier for a next successive document comprising the same unique term,(ii) an offset value operating as pointer in said next successive document while indicating said unique term'"'"'s posting sequence number in said next successive document as well as a direct indication of that unique term'"'"'s weight in the document so identified.
- View Dependent Claims (5, 6)
- - 5. A method as claimed in claim 4, further comprising storing for each document a normalization factor (n) for thereby normalizing any offset received to a weight factor as dependent of the number of postings for the document in question.
  - 6. A method as claimed in claim 5, wherein said weight is inversely proportional to the square root of the rank of occurrence frequency of the particular term in the sequence.

7. A method for computer-accessing document postings in a computer-based full text retrieval system, said method comprising the steps of:
- a) entering for a document term a term identity representation,b) searching for an initial posting in a posting base table that has a document identifier and an offset,c) executing a program loop and while in each loop traversal, by means of a most recent document identifier and most recent offset, computing a next posting address, each posting so addressed containing a next document identifier and offset thereupon featuring as being most recent,d) and in each loop traversal also using its offset value found as a pointer and also as a weight representation number associated to said term in respect of its occurrence in the document associated to each respective loop traversal.
- View Dependent Claims (8, 9)
- - 8. A method as claimed in claim 7, wherein said document identifier addresses a document base table for by means of adding of a document base pointer to an actual offset generating a posting address.
  - 9. A method as claimed in claim 8, further by means of said document identifier addressing a weight normalization factor that combines with an associated offset to a weight quantity.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Aalbersberg, Ijsbrand J.
Primary Examiner(s)
McElheny, Jr., Donald E.

Application Number

US07/860,615
Time in Patent Office

708 Days
Field of Search

364/419, 364/419.13, 364/419.19, 364/419.07
US Class Current

1/1
CPC Class Codes

G06F 16/30 of unstructured textual dat...

Y10S 707/99935 Query augmenting and refini...

Method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

51 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

Method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

51 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others