×

Language-independent method of generating index terms

  • US 5,752,051 A
  • Filed: 07/19/1994
  • Issued: 05/12/1998
  • Est. Priority Date: 07/19/1994
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of extracting index terms from sample text relative to background text, comprising the steps of(a) filtering the background text to remove undesired symbols, thereby to produce filtered background text;

  • (b) counting the n-grams in said filtered background text to produce background n-gram counts;

    (c) filtering the sample text to remove undesired symbols, thereby to produce filtered sample text;

    (d) counting the n-grams in said filtered sample text to produce sample n-gram counts;

    (e) comparing said sample n-gram counts to said background n-gram counts to produce n-gram scores;

    (f) assigning to each symbol of said filtered sample text a symbol score derived from said n-gram scores, said symbol score being derived from the scores of the n-grams containing said symbol;

    (g) determining a symbol score threshold; and

    (h) extracting as index terms the words and phrases of said filtered sample text that contain symbols whose symbol scores exceed said symbol score threshold.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×