×

APPARATUS FOR SELECTING DOCUMENTS IN RESPONSE TO A PLURALITY OF INQUIRIES BY A PLURALITY OF CLIENTS BY ESTIMATING THE RELEVANCE OF DOCUMENTS

  • US 20080071778A1
  • Filed: 11/30/2007
  • Published: 03/20/2008
  • Est. Priority Date: 04/05/2004
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus for selecting documents in response to a plurality of inquiries by a plurality of clients by estimating the relevance of documents when judgments of such relevance are generally subjective comprising:

  • a server;

    said server comprising an input permitting a plurality of clients to search said server for electronic documents stored therein and said input adapted to receive inquiries from said plurality of clients;

    a network for coupling said plurality of clients to said server through said input;

    said server comprises a central processing unit and a memory;

    said memory comprises a document database;

    said document database stores a plurality of documents;

    said relevance of each of said plurality of documents is estimated by a probabilistic model, based on a Markov process, for ranking each of said plurality of documents in said document database based on each of said plurality of documents probability of relevance to each of said plurality of inquiries;

    said Markov process comprises a plurality of time steps;

    said relevance of each of said plurality of documents to each of said plurality of inquiries is determined by said central processing unit which builds a trellis diagram for each of said plurality of inquiries;

    said trellis diagram corresponding to a two dimensional array of nodes, one dimension of said array corresponds to a plurality of columns of said array and a second dimension of said array corresponds to a plurality rows of said array;

    each row of said plurality of rows corresponds to one of said plurality of documents;

    each column of said plurality of columns corresponds one of said steps of said Markov process, each column of said plurality of columns comprises a plurality of elements there being an element of said plurality of elements corresponding to each of said plurality of documents;

    a transition matrix stored in said memory comprising a plurality of elements;

    said transition matrix comprising a number of rows and a number of columns;

    said number of rows and said number of columns being equal to the number of said plurality of documents;

    said plurality of elements of said transition matrix comprise a measure of the relevance of each document of said plurality of documents to each of the other of said plurality of documents;

    said central processing unit using said transition matrix to generate said trellis diagram;

    said central processing unit in response to one of said plurality of inquiries determining a time zero probability vector;

    said time zero probability vector comprising a number of vector elements, said number of vector elements is equal to said number of said plurality of documents;

    said number of vector elements of said time zero probability vector is a time zero determination of the relevance of said one of said plurality of inquiries to each of said plurality of documents;

    said time zero determination of said relevance of said one of said plurality of inquiries to each of said plurality of documents is improved upon by said probabilistic model by matrix multiplying said time zero probability vector by said transition matrix to determine a time one probability vector;

    said time one probability vector comprises a number of vector elements, said number of vector elements of said time one probability vector is equal to said number of said plurality of documents;

    said number of vector elements of said time one probability vector is a time one determination of the relevance of said one of said plurality of inquiries to each of said plurality of documents;

    said matrix multiplying of a probability vector corresponding to a particular time by said transition matrix to determine a probability vector for the next time unit after said particular time is repeated a sufficient number of times to generate said trellis diagram which is a matrix formed by a plurality of time ordered probability vectors, said sufficient number of said times is selected from the group consisting of;

    a specified number of times, and a number of times determined from an analysis of sample queries of sample documents, wherein said analysis of said sample queries of said sample documents identifies an optimal value of the number of said time ordered probability vectors needed to identify the most relevant document in said sample documents; and

    , an analysis of the relevance of said plurality of documents corresponding to said particular inquiry is determinable from said two dimensional array of nodes of said trellis diagram.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×