×

Multi-language document search and retrieval system

  • US 7,174,290 B2
  • Filed: 07/07/2003
  • Issued: 02/06/2007
  • Est. Priority Date: 11/30/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-readable medium containing a computer program for searching for documents that may contain text in any of a plurality of languages, wherein the computer program performs the steps of:

  • separating text in each document to be searched into individual word tokens;

    reducing the word tokens to grammatical stems by removing word endings that are associated with any one or more of the languages, without regard to whether the remaining stem is a recognized word in any of the plurality of languages;

    storing the stems in an index that identifies the documents in which words containing the stems appeared;

    parsing a query containing a string of text into individual word tokens;

    reducing the word tokens from the query to grammatical stems by removing word endings that are associated with any one or more of the languages, without regard to whether the remaining stem is a recognized word in any of the plurality of languages;

    searching the index for entries that match the stems obtained from the query; and

    displaying an identification of the documents that contained matching entries.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×