×

Method for extracting multi-word technical terms from text

  • US 5,423,032 A
  • Filed: 01/03/1992
  • Issued: 06/06/1995
  • Est. Priority Date: 10/31/1991
  • Status: Expired due to Fees
First Claim
Patent Images

1. Programmed computer apparatus for extracting a list of candidate multi-word technical terms from an input text file, a multi-word technical term being a string of at least two words having a particular meaning in some technical field, said apparatus comprising:

  • means for storing a stoplist of tokens which are assumed to not occur in multi-word technical terms, a token being a word, character or string of characters delimited by blanks and/or punctuation;

    means for storing a maximum length parameter specifying a maximum number of tokens in any candidate multi-word technical term to be extracted;

    means responsive to the stored stoplist for extracting text fragments from an input text file by identifying delimiting tokens in the input text file, including means for identifying as a delimiting token each token in the input text file which is the same as a token in the stored stoplist, the identified delimiting tokens defining text fragments therebetween;

    means for deriving from the extracted text fragments all possible subsequences of tokens having a length of at least two tokens and not more than a maximum number of tokens specified by the stored maximum length parameter;

    means for testing each of the derived subsequences against at least one filtering condition; and

    means for creating a sublist of the derived subsequences which pass the at least one filtering condition, the created sublist being the list of candidate multi-word technical terms.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×