×

Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query

  • US 5,265,065 A
  • Filed: 10/08/1991
  • Issued: 11/23/1993
  • Est. Priority Date: 10/08/1991
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented process for forming a search query for searching a document database by a computer-implemented search process, the search process identifying documents likely to match the search query by matching individual terms of the search query to individual terms and sequences of terms in the document database, the process for forming the search query comprising:

  • a) providing a first database containing a plurality of phrases derived from domain specific natural-language phrases, each of said phrases consisting of a plurality of stemmed terms in original order;

    b) input to a computer an input query composed in natural language and comprising a plurality of unstemmed terms arranged in a user-selected order;

    c) parsing said input query into separate terms;

    d) stemming the terms of said input query to form an ordered sequence of stemmed terms, the order of the stemmed terms in the sequence being the same as the order of the unstemmed terms in the input query;

    e) selecting groups of stemmed terms, each group consisting of a plurality of successive stemmed terms of the sequence;

    f) comparing each group of stemmed terms to each phrase in said first database to identify each group of stemmed terms of the input query that matches a phrase in said first database;

    g) for each identified group of stemmed terms, identifying those stemmed terms which are shared by two successive identified groups of stemmed terms, identifying whether the number of stemmed terms in the two successive groups sharing a stemmed term is equal or unequal, assigning the shared stemmed term to only that group of the two successive groups containing the greatest number of stemmed terms if the number of terms is unequal, or assigning the shared stemmed term to only the first group of the two successive groups in the number of terms is equal; and

    h) replacing each identified group of stemmed terms of the input query by the matching phrase from said first database, the individual terms of the search query comprising each matching phrase substituted for groups of stemmed terms of the input query and each remaining stemmed term of the input query.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×