Filtering system for providing personalized information in the absence of negative data
First Claim
1. A computer implemented method for ranking documents in a database in accordance with preferences of a viewer of the documents, the method comprising:
- presenting a document set from which a viewer can select one or more documents for viewing by the viewer;
generating at least one positive word vector using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing;
generating at least one negative word vector using words contained in at least a segment of the documents in the document set that are not selected by the viewer for viewing;
generating a group of word vectors for a group of documents to be ranked; and
ranking the group of documents using a word vector space representation of at least the document set operative with said positive word vector, said negative word vector, and the group of word vectors.
11 Assignments
0 Petitions
Accused Products
Abstract
A system for organizing a content site so that articles preferred by a user (viewer) of the site are brought to the fore for easy access. The system observes the user'"'"'s actions during the normal course of browsing through a content site, and creates a model of the user'"'"'s preferences for various types of articles. This model is created as an Internet user ‘clicks’ on articles which the user desires to read, without requiring any other feedback from the user. The user model is then employed to reorganize the content site so that the articles preferred by the user are presented in an order according to the user'"'"'s interests. This model can also be used to present the user with advertising material based on the user'"'"'s demonstrated interests. The system performs the above functions by using word vector-space representation of the documents combined with adaptive learning techniques. A word vector for a document is created by counting all the occurrences of each word in a document and creating a vector whose components comprise the word frequencies. A document is represented by a point in a high-dimensional space whose axes represent the words in a given dictionary. Thus, similar documents are close together in this vector-space. The word vector of an article forms the input to an adaptive ranking engine. The output of the ranking engine is a value which represents the strength of a particular user'"'"'s preference for reading that article. In this manner, the contents of an online newspaper or an archive of any type can be rank ordered by the numerical value of the output of the ranking system.
-
Citations
68 Claims
-
1. A computer implemented method for ranking documents in a database in accordance with preferences of a viewer of the documents, the method comprising:
-
presenting a document set from which a viewer can select one or more documents for viewing by the viewer; generating at least one positive word vector using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; generating at least one negative word vector using words contained in at least a segment of the documents in the document set that are not selected by the viewer for viewing; generating a group of word vectors for a group of documents to be ranked; and ranking the group of documents using a word vector space representation of at least the document set operative with said positive word vector, said negative word vector, and the group of word vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An article of manufacture comprising a computer-readable storage device storing computer-readable instructions which, when executed, cause one or more computers to perform the following:
-
present a document set from which a viewer can select one or more documents for viewing by the viewer; generate at least one positive word vector using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; generate at least one negative word vector using words contained in at least a segment of the documents in the document set that are not selected by the viewer for viewing; generate a group of word vectors for a group of documents to be ranked; and rank the group of documents using a word vector space representation of at least the document set operative with said positive word vector, said negative word vector, and the group of word vectors.
-
-
17. A computer implemented method for ranking articles in a database in accordance with preferences of a viewer of the documents, the method comprising:
-
presenting, to the viewer, a document set from which articles can be selected; generating at least one positive word vector using words contained in at least a segment of the articles in the documents set that are selected by the viewer for viewing; applying an expectation maximization algorithm to the articles in the document set that are not selected by the viewer for viewing to generate negative labels for certain ones of said non-selected articles, and positive labels for the rest of said non-selected articles; generating at least one negative word vector using words contained in at least a segment of the articles for which the expectation maximization algorithm generates negative labels; and generating a group of word vectors for a group of articles to be ranked; and ranking the group of articles using a word vector space representation of at least the articles in the document set operative with said positive word vector, said negative word vector, and the group of word vectors. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. An article of manufacture comprising a computer-readable storage device storing computer-readable instructions which, when executed, cause one or more computers to perform the following:
-
present, to the viewer, a document set from which articles can be selected; generate at least one positive word vector using words contained in at least a segment of the articles in the documents set that are selected by the viewer for viewing; apply an expectation maximization algorithm to the articles in the document set that are not selected by the viewer for viewing to generate negative labels for certain ones of said non-selected articles, and positive labels for the rest of said non-selected articles; generate at least one negative word vector using words contained in at least a segment of the articles for which the expectation maximization algorithm generates negative labels; and generate a group of word vectors for a group of articles to be ranked; and rank the group of articles using a word vector space representation of at least the articles in the document set operative with said positive word vector, said negative word vector, and the group of word vectors.
-
-
32. A method for ranking documents in a database in accordance with preferences of a viewer of certain ones of the documents, the method comprising:
-
presenting, to the viewer, a document set from which documents can be selected for viewing by the viewer, wherein said document set comprises a set of synopses representative of the documents presented; accessing a document dictionary that includes dictionary words; generating at least one positive word vector, using the documents in the document set selected by the viewer for viewing, by storing, in a given location in computer memory, a cumulative count of the number of occurrences of each dictionary word found in at least a segment of the documents selected by the viewer for viewing; generating at least one negative word vector, using the documents in the document set that are not selected by the viewer for viewing, by storing, in a given location in computer memory, a cumulative count of the number of occurrences of each dictionary word found in at least a segment of the documents not selected by the viewer for viewing; ranking documents in the database by using a learning algorithm that operates directly on the positive word vector, the negative word vector, and word vectors of the documents to be ranked in a word vector space to rank the documents. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39)
-
-
40. An article of manufacture comprising a computer-readable storage device storing computer-readable instructions which, when executed, cause one or more computers to perform the following:
-
present, to the viewer, a document set from which documents can be selected for viewing by the viewer, wherein said document set comprises a set of synopses representative of the documents presented; access a document dictionary that includes dictionary words; generate at least one positive word vector, using the documents in the document set selected by the viewer for viewing, by storing, in a given location in computer memory, a cumulative count of the number of occurrences of each dictionary word found in at least a segment of the documents selected by the viewer for viewing; generate at least one negative word vector, using the documents in the document set that are not selected by the viewer for viewing, by storing, in a given location in computer memory, a cumulative count of the number of occurrences of each dictionary word found in at least a segment of the documents not selected by the viewer for viewing; rank documents in the database by using a learning algorithm that operates directly on the positive word vector, the negative word vector, and word vectors of the documents to be ranked in a word vector space to rank the documents.
-
-
41. A computer implemented method for personalizing advertising in accordance with preferences of a viewer of documents presented to a viewer thereof, the method comprising:
-
presenting, to the viewer, a document set from which the viewer can select one or more documents for viewing; generating at least one positive word vector using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; generating at least one negative word vector using words contained in at least a segment of at least one document in the document set that is not selected by the viewer for viewing; generating word vectors for the documents in the document set selected by the viewer for viewing; ranking the documents selected by the viewer using a vector space relationship analysis of the positive word vector, the negative word vector, and the word vectors for the documents selected by the viewer to establish a document rank order of the documents selected by the viewer, wherein the document rank order is indicative of preferences of the viewer; classifying the documents selected by the viewer in predetermined categories; classifying each of a plurality of advertisements in an ad database in one of said predetermined categories; and presenting, to the viewer, said advertisements having an identical said category as the documents selected by the viewer, according to the document rank order determined by the ranking step. - View Dependent Claims (42)
-
-
43. An article of manufacture comprising a computer-readable storage device storing computer-readable instructions which, when executed, cause one or more computers to perform the following:
-
present, to the viewer, a document set from which the viewer can select one or more documents for viewing; generate at least one positive word vector using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; generate at least one negative word vector using words contained in at least a segment of at least one document in the document set that is not selected by the viewer for viewing; generate word vectors for the documents in the document set selected by the viewer for viewing; rank the documents selected by the viewer using a vector space relationship analysis of the positive word vector, the negative word vector, and the word vectors for the documents selected by the viewer to establish a document rank order of the documents selected by the viewer, wherein the document rank order is indicative of preferences of the viewer; classify the documents selected by the viewer in predetermined categories; classify each of a plurality of advertisements in an ad database in one of said predetermined categories; and present, to the viewer, said advertisements having an identical said category as the documents selected by the viewer, according to the document rank order determined by the ranking step.
-
-
44. A computer implemented method for personalizing advertising in accordance with preferences of a viewer of documents presented to a viewer thereof, the method comprising:
-
presenting, to the viewer, a document set from which the viewer can select one or more documents for viewing; generating a at least one positive word vector using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; generating at least one negative word vector using words contained in at least a segment of at least one document in the document set that is not selected by the viewer for viewing; generating word vectors for the documents in the document set selected by the viewer for viewing; ranking the documents selected by the viewer using a vector space relationship analysis of the positive word vector, the negative word vector, and the word vectors for the documents selected by the viewer to establish a document rank order of the documents selected by the viewer, wherein the document rank order is indicative of preferences of the viewer; categorizing advertisements in an ad database in predetermined categories; categorizing the documents selected by the viewer in said predetermined categories; prioritizing the interests of the viewer with a numerical score of relevance based on the document rank order and said predetermined categories; matching, with the viewer'"'"'s prioritized interests, the advertisements which have been categorized; and presenting, to the viewer, the advertisements which have been matched with the viewer'"'"'s prioritized interests. - View Dependent Claims (45, 46)
-
-
47. An article of manufacture comprising a computer-readable storage device storing computer-readable instructions which, when executed, cause one or more computers to perform the following:
-
present, to the viewer, a document set from which the viewer can select one or more documents for viewing; generate a at least one positive word vector using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; generate at least one negative word vector using words contained in at least a segment of at least one document in the document set that is not selected by the viewer for viewing; generate word vectors for the documents in the document set selected by the viewer for viewing; rank the documents selected by the viewer using a vector space relationship analysis of the positive word vector, the negative word vector, and the word vectors for the documents selected by the viewer to establish a document rank order of the documents selected by the viewer, wherein the document rank order is indicative of preferences of the viewer; categorize advertisements in an ad database in predetermined categories; categorize the documents selected by the viewer in said predetermined categories; prioritize the interests of the viewer with a numerical score of relevance based on the document rank order and said predetermined categories; match, with the viewer'"'"'s prioritized interests, the advertisements which have been categorized; and present, to the viewer, the advertisements which have been matched with the viewer'"'"'s prioritized interests.
-
-
48. A method for generating an advertising profile for a viewer of articles on a website, the method comprising:
-
labeling a plurality of said articles with interest categories; training a categorizer by inputting the articles which were labeled; using the categorizer to label new articles with interest categories relevant to a particular advertising campaign; ranking interests of the viewer by; presenting, to the viewer, a document set from which the viewer can select one or more labeled articles for viewing; creating a user profile for the viewer by; generating at least one positive word vector using words contained in at least a segment of the labeled articles in the document set that are selected by the viewer for viewing; generating at least one negative word vector using words contained in at least a segment of the labeled articles in the document set that are not selected by the viewer for viewing; performing a vector space relationship analysis of the positive word vector and the negative word vector to establish a document rank order of a set of the labeled articles selected by the viewer for viewing; ranking the interest categories associated with the set of labeled articles based on the document rank order; and creating an advertising profile comprising the ranked interest categories. - View Dependent Claims (49, 50)
-
-
51. An article of manufacture comprising a computer-readable storage device storing computer-readable instructions which, when executed, cause one or more computers to perform the following:
-
label a plurality of said articles with interest categories; train a categorizer by inputting the articles which were labeled; use the categorizer to label new articles with interest categories relevant to a particular advertising campaign; rank interests of the viewer by; present, to the viewer, a document set from which the viewer can select one or more labeled articles for viewing; create a user profile for the viewer by; generate at least one positive word vector using words contained in at least a segment of the labeled articles in the document set that are selected by the viewer for viewing; generate at least one negative word vector using words contained in at least a segment of the labeled articles in the document set that are not selected by the viewer for viewing; perform a vector space relationship analysis of the positive word vector and the negative word vector to establish a document rank order of a set of the labeled articles selected by the viewer for viewing; rank the interest categories associated with the set of labeled articles based on the document rank order; and create an advertising profile comprising the ranked interest categories.
-
-
52. A system for ranking documents in a database in accordance with preferences of a viewer of the documents, the system comprising:
-
one or more processing devices; one or more storages storing instructions which, when executed, cause the one or more processing devices to implement; a content server for presenting a document set from which a viewer can select one or more documents for viewing; at least one positive word vector formed using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; at least one negative word vector formed using words contained in at least a segment of at least one document in the document set that is not selected by the viewer for viewing; a group of word vectors for a group of documents to be ranked; and a ranking engine for ranking the group of documents using a word vector space representation of at least the document set operative with the positive word vector, the negative word vector, and the group of word vectors. - View Dependent Claims (53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64)
-
-
65. A system for ranking documents in a document database in accordance with preferences of a viewer of the documents, the system comprising:
-
one or more processing devices; one or more storages storing instructions which, when executed, cause the one or more processing devices to implement; a content server for presenting said documents to the viewer; a document database for storing said documents in a manner that allows a viewer to select documents for viewing; a document dictionary; at least one positive word vector comprising a plurality of word count descriptors, each indicative of a cumulative count of the number of occurrences of each word, found in at least a segment of each one of the documents selected by the viewer, that is also found in said document dictionary; at least one negative word vector comprising a plurality of word count descriptors, each indicative of a cumulative count of the number of occurrences of each word, found in at least a segment of each one of the documents that are not selected by the viewer for viewing, that is also found in said document dictionary; a group of word vectors for a group of documents to be ranked; and a ranking engine for ranking the documents by using a learning algorithm that operates directly on the positive word vector, the negative word vector, and the group of word vectors in a word vector space to rank the group of documents. - View Dependent Claims (66)
-
-
67. A system for personalizing advertising material in accordance with preferences of a viewer of documents, the system comprising:
-
one or more processing devices; one or more storages storing instructions which, when executed, cause the one or more processing devices to implement; a content server for presenting documents to the viewer such that the viewer can select documents for viewing; an ad database containing advertisements; at least one positive word vector formed using words contained in at least a segment of the documents in the document set that are selected by the viewer for viewing; at least one negative word vector formed using words contained in at least a segment of the documents in the document set that are not selected by the viewer for viewing; a ranking engine to rank a group of the documents selected for viewing by the viewer using a word vector space representation of the documents operative with the positive word vector, the negative word vector, and word vectors for the group of documents selected for viewing by the viewer; a support vector machine for classifying into categories the advertisements and the ranked group of documents; and an ad server, operatively coupled with said ranking engine, for presenting, to the viewer, the advertisements having categories that correspond to the categories of the documents in the group of documents, wherein the advertisements are presented in accordance with the ranks of the documents in the group of documents. - View Dependent Claims (68)
-
Specification