×

System and method for identifying useless documents

  • US 6,397,211 B1
  • Filed: 01/03/2000
  • Issued: 05/28/2002
  • Est. Priority Date: 01/03/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A processing system for identifying useless documents, comprising:

  • at least one document database storing a plurality of documents;

    at least one processor;

    a search engine, executed by the at least one processor, that accesses at least one document stored within the at least one document database satisfying a query; and

    a useless document identifier engine, executed by the at least one processor, for identifying useless documents from the at least one accessed document, the useless document identifier engine determining if the at least one accessed document is useless by determining if one of the following two conditions is true;

    (i) a length of the at least one accessed document is less than a first predetermined amount of bytes;

    or (ii) the length of the at least one accessed document is less than a second predetermined amount of bytes, the at least one accessed document has less than a predetermined number of terms with an Intelligent Quotient (IQ) greater than a first predetermined number, and the at least one accessed document has less than a predetermined number of appearances of terms having a tf*idf value of greater than a second predetermined number.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×