×

Computerized cross-language document retrieval using latent semantic indexing

  • US 5,301,109 A
  • Filed: 07/17/1991
  • Issued: 04/05/1994
  • Est. Priority Date: 06/11/1990
  • Status: Expired due to Term
First Claim
Patent Images

1. A multi-language information retrieval method for operating a computer system, including an information file of stored data objects, to retrieve selected data objects based on a user query, the method comprising the steps ofselecting a set of training data objects from the stored data objects, said set of training data objects selected to satisfy predetermined retrieval criteria,translating each of said data objects in said set of training data objects into multiple languages to produce multiple translations and to generate a set of multi-language training data objects corresponding to said set of training data objects, and storing said translations corresponding to each of said multi-language training data objects in the information file,for each of said multi-language training data objects, merging all of said translations into a single merged data object composed of terms contained in all of said translations, thereby generating a set of merged data objects corresponding to said set of multi-language training data objects,parsing each said merged data object to extract distinct ones of said terms and generating a lexicon database from said distinct terms,generating a joint term-by-data object matrix by processing said translations as stored in the information file, wherein said matrix has t rows in correspondence to said distinct terms in said lexicon database and d columns in correspondence to the number of said merged data objects in said set of merged data objects, and wherein each (i,j) cell of said matrix registers a tabulation of the occurrence of the ith distinct term in the jth merged data object,decomposing said matrix into a reduced singular value representation composed of a distinct term file and a data object file to create a semantic space,generating a pseudo-object, in response to the user query, by parsing the user query to obtain query terms and applying a given mathematical algorithm to said distinct terms and said query terms, and inserting said pseudo-object into said semantic space,examining the similarity between said pseudo-object and the stored data objects in said semantic space to generate the selected data objects corresponding to said pseudo-object, andgenerating a report of the selected data objects.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×