×

Analyzing the ability to find textual content

  • US 7,792,830 B2
  • Filed: 08/01/2006
  • Issued: 09/07/2010
  • Est. Priority Date: 08/01/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for document analysis, comprising the steps of:

  • designating a subset of relevant documents from a document collection;

    using a greedy algorithm to establish a query coverage set of words or terms, wherein at each stage thereof a single word or term from the subset of relevant documents is included in the query coverage set, wherein the single word or term minimizes a distance measurement between the document collection and the query coverage set, wherein the distance measurement is determined by constructing a difficulty model for a topic by computing a plurality of distances comprising a first distance between the query coverage set and the document collection (d(Q,C)), a second distance among the query coverage set (d(Q,Q));

    a third distance between the subset of relevant documents and the document collection (d(R,C)), a fourth distance among the subset of relevant documents (d(R,R)), and a fifth distance between the query coverage set and the subset of relevant documents (d(Q,R));

    storing the query coverage set in a database;

    constructing a set of queries from the query coverage set, each of the queries having a number of terms;

    executing the queries in a search engine to generate respective results;

    responsively to the respective results determining an average precision for each of the queries by considering the subset of relevant documents as representing the document collection;

    categorizing the queries by analyzing the average precision against the number of terms thereof; and

    reporting respective abilities of the categorized queries to find information in the subset of relevant documents.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×