System, method and computer program product for information sorting and retrieval using a language-modeling kernal function
First Claim
1. A system for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, and relations among plurality of documents, the system comprising:
- a data source comprising the plurality of documents; and
a host computing element in communication with said data source and configured to receive an initial user input comprising the user query;
wherein said host computing element is further configured to convert each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a distribution of a plurality document terms present in each of the plurality of documents;
wherein said host computing element is further configured to convert the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality document terms present in the plurality of documents;
wherein said host computing element is further configured to define a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model;
wherein said host computing element is further configured to automatically obtain via the defined kernel function a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality document terms present in the plurality of documents, the distribution of the plurality document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query;
wherein said host computing element is further configured to map via the defined kernel function each of the plurality of the document language models and the query language model in the first vector space; and
wherein said host computing element is further configured to rank each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query.
6 Assignments
0 Petitions
Accused Products
Abstract
Various embodiments provide a system, method, and computer program product for sorting and/or selectively retrieving a plurality of documents in response to a user query. More particularly, embodiments are provided that convert each document into a corresponding document language model and convert the user query into a corresponding query language model. The language models are used to define a vector space having dimensions corresponding to terms in the documents and in the user query. The language models are mapped in the vector space. Each of the documents is then ranked, wherein the ranking is based at least in part on a position of the mapped language models in the vector space, so as to determine a relative relevance of each of the plurality of documents to the user query.
-
Citations
23 Claims
-
1. A system for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, and relations among plurality of documents, the system comprising:
-
a data source comprising the plurality of documents; and a host computing element in communication with said data source and configured to receive an initial user input comprising the user query; wherein said host computing element is further configured to convert each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a distribution of a plurality document terms present in each of the plurality of documents; wherein said host computing element is further configured to convert the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality document terms present in the plurality of documents; wherein said host computing element is further configured to define a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model; wherein said host computing element is further configured to automatically obtain via the defined kernel function a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality document terms present in the plurality of documents, the distribution of the plurality document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query; wherein said host computing element is further configured to map via the defined kernel function each of the plurality of the document language models and the query language model in the first vector space; and wherein said host computing element is further configured to rank each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, and relationships among the plurality of document, the method comprising:
-
converting each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a plurality document terms present in each of the plurality of documents; converting the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality of document terms present in the plurality of documents; defining a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model; obtaining automatically via the defined kernel function a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality document terms present in the plurality of documents, the distribution of the plurality document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query; mapping via the defined kernel function each of the document language models and the query language model in the first vector space; and ranking each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A computer program product for sorting a plurality of documents based at least in part on a relationship between each of the plurality of documents and a user query, relevance feedback, interest, and relations among plurality of documents, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code instructions stored therein comprising:
-
a first set of computer instructions for converting each of the plurality of documents into a corresponding document language model, each document language model being associated with a distribution of a plurality document terms present in the plurality of documents and a plurality document terms present in each of the plurality of documents; a second set of computer instructions for converting the user query into a corresponding query language model, the query language model being associated with a distribution of a plurality of query terms present in the user query and the distribution of the plurality of document terms present in the plurality of documents; a third set of computer instructions for defining a kernel function configured to evaluate a similarity relationship between two document language models under the influence of the query language model; a fourth set of computer instructions for automatically obtaining via the defined kernel function of the third set of computer instructions a first vector space having a plurality of dimensions associated with at least two of the distribution of the plurality of document terms present in the plurality of documents, the distribution of the plurality of document terms present in each of the plurality of documents, and the distribution of the plurality of query terms present in the user query; a fifth set of computer instructions for mapping via the defined kernel function each of the document language models and the query language model in the first vector space; and a sixth set of computer instructions for ranking each of the plurality of documents based at least in part on a similarity relationship between each of the document language models and the query language model in the first vector space to determine a relative relevance of each of the plurality of documents to the user query. - View Dependent Claims (20, 21, 22, 23)
-
Specification