×

Finite-state transduction of related word forms for text indexing and retrieval

  • US 5,625,554 A
  • Filed: 07/20/1992
  • Issued: 04/29/1997
  • Est. Priority Date: 07/20/1992
  • Status: Expired due to Term
First Claim
Patent Images

1. A computerized information retrieval or text indexing device, comprising:

  • (a) a database stored on a computer readable medium, said database comprising a data structure for representing stem-variant relations of a language, said data structure comprising a finite state transducer (FST) encoding along a plurality of branches sets of ordered-pairs of upper and lower strings wherein the upper string of each pair is a valid word stem and the lower string of each pair is a valid word variant, said data structure being constructed such that traversing a branch of the FST via the upper string of a pair will enable retrieval of the lower string of the pair, or traversing a branch of the FST via the lower string of a pair will enable retrieval of the upper string of the pair,(b) processing means connected to the computer readable medium, in response to a user query inputting a word incorporating a stem or a variant, for traversing the data structure FST searching for a complete path through an FST branch having a lower string matching the query word, said processing means further comprising means in response to finding a complete path through a branch for outputting the upper string stem represented by that branch and corresponding to the query word or an identification of a document containing the same, or for outputting another word variant represented by that branch and having the same stem as the query word or an identification of a document containing the same.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×