×

Method of indexing and retrieval of electronically-stored documents

  • US RE36,727 E
  • Filed: 04/04/1997
  • Issued: 06/06/2000
  • Est. Priority Date: 12/26/1989
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of indexing .[.and retrieving.]. documents, said method using a digital computer system having a central processing unit, a memory, .[.a display screen, a keyboard,.]. and a large capacity file system, said method comprising the steps of:

  • (a) storing in said memory a vocabulary of terms, each term consisting of one or more words, and for each term an associated term-code;

    (b) storing on said file system a collection of documents each with an associated unique document-number;

    (c) creating index files which contain for each said term-code in (a)(i) the set of document-numbers in (b) such that the corresponding documents contain the corresponding term; and

    (ii) for each said document-identifying-number in (i) the frequency-in-document of the corresponding term which is the number of times that said term appears in the corresponding document;

    (d) creating a weight-in-document file which contains for each document-number in (c)(i) the weight-in-document of the corresponding term which is calculated using the frequency-in-document in (c) (ii), the number of document-numbers in (c) (i), and the total number of terms in (a) which are in the corresponding document (counted multiple times);

    (e) creating a frequent-companion file which contains for each occurring term-code in (a) a ranked set of pairs of numbers where each pair consists of a first element term-code and a second element companion-percentage, where the companion-percentage is calculated by summing the weight-in-document values of said first element term-code over documents that contain both the term corresponding to said first element term-code and the term corresponding to said occurring term-code and then dividing by the sum over all documents of the weight-in-document of said occurring term-code;

    (f) creating a relative file which contains for each occurring term-code in (a) a ranked set of pairs of numbers where each pair consists of a first element relative term-code and a second element relative-percentage, where the relative-percentage is calculated by taking a weighted average of the companion-percentage of said first element term-code calculated in step (e) and the companion-percentage of said occurring term-code that was calculated in step (e) when said first element term-code was the occurring term-code and said occurring term-code was the first element term-code.[.;

    (g) creating a polysemantic file which contains for each occurring term-code in (a), a polysemantic weight which is calculated using the number of sets of pairs in the relative file created in step (f) that said occurring term-code appears in, the number of documents-numbers for which the weight-in-document of said occurring term-code calculated in step (d) is greater than some threshold value, and the averages for several values of N of the first N relative-percentages of said occurring term-code calculated and ranked in step (f);

    (h) accepting a query consisting of a sequence of words entered by a user using said keyboard and creating a parsed-query table of term-codes which consist of the term-codes in said vocabulary that are associated with the terms that are contained in said query;

    (i) creating a temporary swap table of pairs of first element term-codes and corresponding second element summed-relative-percentages consisting of those relative term-codes created in step (f) where said corresponding second element summed-relative-percentages are the sum, over all said occurring term-codes that are in said parsed-query table, of the relative percentages of said first element term-codes;

    (j) creating a modified swap table by modifying said second element summed-relative-percentages created in step (i) by multiplying them by a function of the polysemantic weight of the corresponding first element term-codes;

    (k) sorting said modified swap table by said modified summed-relative-percentages in descending order;

    (l) displaying on said display the terms corresponding to the term-codes of said modified swap table;

    (m) accepting user keypresses or other actions which identify one or more of the terms displayed in step (l) and adding the corresponding term-codes to the parsed-query-table;

    (n) repeating steps (i) through (m) as many times as the user indicates by his input;

    (o) accepting an input from the user indicating a command to retrieve documents;

    (p) creating a temporary rank table of pairs of first element document-numbers and corresponding second element summed-document-weight×

    poly values which pairs comprise those document-numbers for which any of the term-codes that are in said parsed-query table have weight-in-document above a threshold value, and summed-document-weight×

    poly values which are the sums, over all term-codes in said parsed-query table, of a function of me polysemantic weight of the term-code and the weight-in-document of the term-code;

    (r) creating a sorted rank table by sorting said temporary rank table by the value of the second elements of the pairs in descending order;

    (s) displaying on the display screen some portion of the document corresponding to the first document number in the sorted rank table and some indication of the corresponding summed-document-weight×

    poly value;

    (t) displaying other documents corresponding to other document-numbers in the sorted rank table in response to inputs from the user.]..

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×