SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR INFORMATION SORTING AND RETRIEVAL USING A LANGUAGE-MODELING KERNAL FUNCTION
First Claim
1. A system for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, and relations among plurality of documents, the system comprising:
- a data source comprising the plurality of documents; and
a host computing element in communication with said data source and configured to receive an initial user input comprising the user query;
wherein said host computing element is further configured to convert each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a distribution of a plurality document terms present in each of the plurality of documents;
wherein said host computing element is further configured to convert the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality document terms present in the plurality of documents;
wherein said host computing element is further configured to define a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model;
wherein said host computing element is further configured to automatically obtain via the defined kernel function a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality document terms present in the plurality of documents, the distribution of the plurality document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query;
wherein said host computing element is further configured to map via the defined kernel function each of the plurality of the document language models and the query language model in the first vector space; and
wherein said host computing element is further configured to rank each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query.
6 Assignments
0 Petitions
Accused Products
Abstract
Various embodiments provide a system, method, and computer program product for sorting and/or selectively retrieving a plurality of documents in response to a user query. More particularly, embodiments are provided that convert each document into a corresponding document language model and convert the user query into a corresponding query language model. The language models are used to define a vector space having dimensions corresponding to terms in the documents and in the user query. The language models are mapped in the vector space. Each of the documents is then ranked, wherein the ranking is based at least in part on a position of the mapped language models in the vector space, so as to determine a relative relevance of each of the plurality of documents to the user query.
-
Citations
40 Claims
-
1. A system for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, and relations among plurality of documents, the system comprising:
-
a data source comprising the plurality of documents; and a host computing element in communication with said data source and configured to receive an initial user input comprising the user query; wherein said host computing element is further configured to convert each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a distribution of a plurality document terms present in each of the plurality of documents; wherein said host computing element is further configured to convert the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality document terms present in the plurality of documents; wherein said host computing element is further configured to define a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model; wherein said host computing element is further configured to automatically obtain via the defined kernel function a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality document terms present in the plurality of documents, the distribution of the plurality document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query; wherein said host computing element is further configured to map via the defined kernel function each of the plurality of the document language models and the query language model in the first vector space; and wherein said host computing element is further configured to rank each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, and relationships among the plurality of document, the method comprising:
-
converting each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a plurality document terms present in each of the plurality of documents; converting the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality of document terms present in the plurality of documents; defining a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model; obtaining automatically via the defined kernel function a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality document terms present in the plurality of documents, the distribution of the plurality document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query; mapping via the defined kernel function each of the document language models and the query language model in the first vector space; and ranking each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer program product for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, interest, and relations among plurality of documents, the computer program product comprising a computer-readable storage medium having computer-readable program code instructions stored therein comprising:
-
a first set of computer instructions for converting each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a plurality document terms present in each of the plurality of documents; a second set of computer instructions for converting the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality of document terms present in the plurality of documents; a third set of computer instructions for defining a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model; a fourth set of computer instructions for automatically obtaining via the defined kernel function of the third set of computer instructions a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality of document terms present in the plurality of documents, the distribution of the plurality of document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query; a fifth set of computer instructions for mapping via the defined kernel function each of the document language models and the query language model in the first vector space; and a sixth set of computer instructions for ranking each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A system adapted to interface with a search engine for sorting a plurality of documents retrieved and ranked by the search engine based at least in part on a relationship between each of the plurality of documents and a user query received via the search engine, relevance feedback, and relations among the plurality of documents, the system comprising:
-
a host computing element configured to receive a user relevance feedback via the search engine, the user relevance feedback comprising a selection of at least a portion of the retrieved plurality of documents, the selection comprising one or more relevant document sample; wherein said host computing element is further configured to generate a plurality of document language models corresponding to each of the plurality of documents, the document language models corresponding at least in part to a plurality of terms present in each of the retrieved plurality of documents; wherein said host computing element is further configured to estimate a query language model based at least in part on the one or more selected relevant document samples, the query language model being associated with a distribution of a plurality document terms present in the one or more selected relevant document samples in the user relevance feedback and a distribution of a plurality query terms present in the user query; wherein said host computing element is further configured to compute a language-modeling kernel based at least in part on the query language model, the language-modeling kernel configured to evaluate a similarity relationship between two document language models under the influence of the query language model; wherein said host computing element is further configured to map the document language models to a high dimensional vector space automatically determined by the computed language-modeling kernel; wherein said host computing element is further configured to generate a decision boundary in the high-dimensional vector space between the document language models corresponding to the selected relevant document samples and the document language models corresponding to a plurality of non-relevant documents; and wherein said host computing element is further configured to re-rank the plurality of documents retrieved from the search engine based at least in part on a location of the decision boundary in the high dimensional vector space to refine a rank of the retrieved plurality of documents based at least in part on the query language model and the plurality of document language models. - View Dependent Claims (40)
-
Specification