×

Method of retrieving documents that concern the same topic

  • US 5,418,951 A
  • Filed: 09/30/1994
  • Issued: 05/23/1995
  • Est. Priority Date: 08/20/1992
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of retrieving at least one document that concerns the same topic as a sample of text by comparing the at least one document to the sample of text, comprising the steps of:

  • a) constructing a first list of unique character groupings that occur in one of the at least one document for each of the at least one document;

    b) constructing a second list of unique character groupings that occur in the sample of text;

    c) assigning a first numerical value to each unique character grouping on each first list, where the first numerical value assigned to one of the unique character groupings is equal to the number of occurrences of the unique character grouping within the document divided by the total number of character groupings within the document;

    d) assigning a second numerical value to each unique character grouping on the second list, where the second numerical value assigned to one of the unique character groupings is equal to the number of occurrences of the unique character grouping within the sample of text divided by the total number of character groupings within the sample of text;

    e) constructing a third list of unique character groupings that occur in the at least one document and the sample of text;

    f) assigning a third numerical value to each unique character grouping on the third list, where the third numerical value assigned to one of the unique character groupings is equal to the sum of the first numerical values of the unique character grouping from all of the first lists divided by the total number of first lists;

    g) replacing each first numerical value on each first list with a corresponding fourth numerical value, where the fourth numerical value for one of the unique character groupings is equal to the first numerical value of the unique character grouping minus the corresponding third numerical value for the unique character grouping;

    h) replacing each second numerical value on the second list with a corresponding fifth numerical value, where the fifth numerical value for one of the unique character groupings is equal to the second numerical value of the unique character grouping minus the corresponding third numerical value for the unique character grouping;

    i) calculating a score for each at least one document with respect to the sample text, where said score is the summation of the products of the fifth numerical values times the corresponding fourth numerical values divided by the square root of the products of the summation of the squares of the fifth numerical values times the summation of the squares of the corresponding fourth numerical values; and

    j) retrieving the documents from the at least one document that obtained a calculated score in the previous step that is above a user-definable score, where each retrieved document is deemed to concern the same topic as the sample of text.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×