×

Search system and method for retrieval of data, and the use thereof in a search engine

  • US 6,377,945 B1
  • Filed: 03/09/2000
  • Issued: 04/23/2002
  • Est. Priority Date: 07/10/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A search system for information retrieval, particularly information stored in form of text, wherein a text T comprises words and/or symbols s and sequences S thereof, wherein the information retrieval takes place with a given or varying degree of matching between a query Q, wherein the query Q comprises words and/or symbols q and sequences thereof P, and retrieved information R comprising words and/or symbols and sequences thereof from the text T, wherein the search system comprises a data structure for storing at least a part of the text T, and a metric M which measures the degree of matching between the query Q and retrieved information R, and wherein the search system implements search algorithms for executing a search, particularly a full text search on the basis of keywords kw, characterized in that the data structure comprises a tree structure in the form of a non-evenly spaced sparse suffix tree ST(T) for storing suffixes of words and/or symbols s and sequences S thereof in the text T, that the metric M comprises a combination of an edit distance metric D(s,q) for an approximate degree of matching between words and/or symbols s;

  • q in respectively the text T and a query Q and an edit distance metric Dws(S,P) for an approximate degree of matching between sequences S of words and/or symbols s in the text T and a query sequence P of words and/or symbols q in the query Q, the latter edit distance metric including weighting cost functions for edit operations which transform sequences of words and/or symbols s in the text T into the sequence P of words and/or symbols q in the query Q, the weighting taking place with a value proportional to a change in the length of the sequence S upon a transformation or dependent on the size of the words and/or symbols s;

    q in sequences S;

    P to be matched, that the implemented search algorithms comprise a first algorithm for determining the degree of matching between words and/or symbols s;

    q in the suffix tree representation of respectively the text T and a query Q, and a second algorithm for determining the degree of matching between sequences S;

    P of words and/or symbols s;

    q in the suffix tree representation of respectively the text T and the query Q, said first and/or second algorithms searching the data structure with queries Q in the form of either words, symbols, sequences of words or sequences of symbols or combinations thereof, such that information R is retrieved on the basis of query Q with a specified degree of matching between the former and the latter, and that the search algorithms optionally also comprise a third algorithm for determining exact matching between words and/or symbols s;

    q in the suffix tree representation of respectively the text T and the query Q and/or a fourth algorithm for determining exact matching between sequences S;

    P of words and/or symbols s;

    q in the suffix tree representation of respectively the text T and the query Q, said third and/or fourth algorithms searching the data structure with queries Q in the form of either words, symbols, sequences of words, or sequences of symbols or combinations thereof, such that information R is retrieved on the basis of the query Q with an exact matching between the former and the latter.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×