×

PHRASE-BASED DETECTION OF DUPLICATE DOCUMENTS IN AN INFORMATION RETRIEVAL SYSTEM

  • US 20100161625A1
  • Filed: 03/04/2010
  • Published: 06/24/2010
  • Est. Priority Date: 07/26/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method of detecting duplicate documents in search results, the method comprising:

  • receiving a query comprising at least one phrase;

    retrieving a plurality of documents responsive to the query to form a search result;

    for each of the retrieved documents, generating a document description comprising selected sentences of the document, wherein the selected sentences are ordered in the document description as a function of a number of phrases in each sentence;

    responsive to the document description at least two documents matching, discarding at least one of the two documents from the search result.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×