Recommending content using discriminatively trained document similarity
First Claim
Patent Images
1. A method for training document similarity models, the method comprising:
- obtaining a set of training samples;
obtaining prior information of document relations and non-relations for the set of training samples, wherein the prior information of document relations comprises information indicating that two or more documents in the set of training samples are considered related to each other, and wherein the prior information of document non-relations comprises information indicating that two or more documents in the set of training samples are not considered related to each other; and
discriminatively training an ensemble of document similarity classification models using the set of training samples and using the prior information of document relations and non-relations using a processor of a computer, wherein the ensemble of document similarity classification models are discriminatively trained based at least in part on prior information of non-relation between a first document and a second document in the set of training samples such that a first classification model configured to determine document similarity with respect to the first document does not compete with a second classification model configured to determine document similarity with respect to the second document.
2 Assignments
0 Petitions
Accused Products
Abstract
A generalized discriminative training framework for reconciling the training and evaluation objectives for document similarity is provided. Prior information about document relations and non-relations, are used to discriminatively train an ensemble of document similarity classification models. This result is a model set that can be used to compute similarity between seen documents in the training sets and new documents. The measure of similarity forms the basis of recommending documents to a user as well as being able to obtain metadata information such as keywords and tags for new documents not having such information.
29 Citations
19 Claims
-
1. A method for training document similarity models, the method comprising:
-
obtaining a set of training samples; obtaining prior information of document relations and non-relations for the set of training samples, wherein the prior information of document relations comprises information indicating that two or more documents in the set of training samples are considered related to each other, and wherein the prior information of document non-relations comprises information indicating that two or more documents in the set of training samples are not considered related to each other; and discriminatively training an ensemble of document similarity classification models using the set of training samples and using the prior information of document relations and non-relations using a processor of a computer, wherein the ensemble of document similarity classification models are discriminatively trained based at least in part on prior information of non-relation between a first document and a second document in the set of training samples such that a first classification model configured to determine document similarity with respect to the first document does not compete with a second classification model configured to determine document similarity with respect to the second document. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A document recommendation system comprising:
-
a set of positive documents determined to be of interest to a user; a set of negative documents determined to not be of interest to the user; a plurality of candidate documents; and a module configured to calculate similarity scores of each document in the set of positive documents relative to the plurality of candidate documents and to calculate similarity scores of each document in the set of negative documents relative to the plurality of candidate documents, and wherein the module receives a new document apart from the plurality of candidate documents, calculates a similarity score, using a processor, of the new document relative to each of the plurality of candidate documents using a measure of discriminatively trained similarity associated with each of the plurality of candidate documents, and outputs a reference to at least one of the plurality of candidate documents based on the calculated similarity scores. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for obtaining metadata related to a document, the system comprising:
-
a plurality of documents, each document having metadata associated therewith, the metadata comprising at least one of a keyword and tag associated with the document; and a module configured to receive a new document apart from the plurality of documents, generate metadata for the new document, and associate the generated metadata with the new document using a processor, wherein the metadata is generated for the new document based on the metadata associated with one or more of the plurality of documents and based on a similarity score of the new document relative to each of the plurality of documents using a measure of similarity based on a weighting factor associated with each document of the plurality of documents. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification