Principles and methods for personalizing newsfeeds via an analysis of information novelty and dynamics

US 7,293,019 B2
Filed: 04/20/2004
Issued: 11/06/2007
Est. Priority Date: 03/02/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A machine implemented system for distributing personalized information, comprising:

a comparator that determines differences between two or more related information items, andan analyzer that automatically determines a subset of the related information items as personalized information based in part on the differences and as data relating to the information items evolves over time and at least one of;

stores the personalized information in a computer storage medium;

ordisplays the personalized information on an output device,wherein the personalized information adds maximum novel information to the subset of the related information, andwherein the subset of information items is at least one of stored in a computer storage medium or displayed on an output device and the analyzer employs the following algorithm;

Algorithm RANKNEWSBYNOVELTY (dist, seed, D, n)
R←

seed//initialization
for i=1 to min(n, |D|) do
d←

argmax_d_iε

D {dist(d_i,R)}
R←

R∪

{d};

D←

D\{d}where dist is a distance metric, seed—

seed story, D—

a set of relevant updates, d—

document, n—

desired number of updates to select and R—

list of articles ordered by novelty.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and methodology is provided for filtering temporal streams of information such as news stories by statistical measures of information novelty. Various techniques can be applied to custom tailor news feeds or other types of information based on information that a user has already reviewed. Methods for analyzing information novelty are provided along with a system that personalizes and filters information for users by identifying the novelty of stories in the context of stories they have already reviewed. The system employs novelty-analysis algorithms that represent articles as a bag of words and named entities. The algorithms analyze inter- and intra-document dynamics by considering how information evolves over time from article to article, as well as within individual articles.

222 Citations

39 Claims

1. A machine implemented system for distributing personalized information, comprising:
- a comparator that determines differences between two or more related information items, andan analyzer that automatically determines a subset of the related information items as personalized information based in part on the differences and as data relating to the information items evolves over time and at least one of;
  
  stores the personalized information in a computer storage medium;
  
  ordisplays the personalized information on an output device,wherein the personalized information adds maximum novel information to the subset of the related information, andwherein the subset of information items is at least one of stored in a computer storage medium or displayed on an output device and the analyzer employs the following algorithm;
  
  Algorithm RANKNEWSBYNOVELTY (dist, seed, D, n)
  R←
  
  seed//initialization
  for i=1 to min(n, |D|) do
  d←
  
  argmax_d_iε
  
  D {dist(d_i,R)}
  R←
  
  R∪
  
  {d};
  
  D←
  
  D\{d}where dist is a distance metric, seed—
  
  seed story, D—
  
  a set of relevant updates, d—
  
  document, n—
  
  desired number of updates to select and R—
  
  list of articles ordered by novelty.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system of claim 1, farther comprising a filter to discard previously observed information.
  - 3. The system of claim 1, the information items relate to a news stream.
  - 4. The system of claim 1, further comprising at least one server to collect the information items for farther processing by the analyzer.
  - 5. The system of claim 1, the comparator processes detailed statistics gathered on word occurrence across sets of documents in order to characterize differences and similarities among the sets.
  - 6. The system of claim 1, further comprising a word model that employs named entities that denote people, organizations, or geographical locations.
  - 7. The system of claim 1, further comprising a personalized news portal or news alerting service that seeks to minimize the time and disruptions to users.
  - 8. The system of claim 1, further comprising a framework for determining differences in a variety of applications, including automatic profiling and comparison of text collections, automatic identification of different views, scopes and interests reflected in the texts, or automatic identification of novel information.
  - 9. The system of claim 1, the comparator determines at least one of a difference in content, a difference in structural organization, and a difference in time.
  - 10. The system of claim 9, further comprising a component for characterizing novelty in news stories and allowing ordering of news articles so that each article adds maximum information to a union of previously read articles.
  - 11. The system of claim 9, for comprising a component for analyzing topic evolution over time to enable quantifying importance and relevance of news updates.
  - 12. The system of claim 11, further comprising providing user controls for topic parameters in order provide a personalized news experience.
  - 13. A computer readable medium having computer readable instructions stored thereon for implementing the components of claim 1.

14. A method for creating personalized information, comprising:
- automatically analyzing documents from different information sources;
  
  automatically determining novelty of the documents;
  
  creating a personalized feed of information based on the novelty of the documents by implementing at least the following algorithm; and
  
  at least one of storing or displaying the personalized feed,wherein the personalized feed of information is at least one of stored in a computer storage medium or displayed on an output device and the analyzer employs the following algorithm;
  
  Algorithm RANKNEWSBYNOVELTY (dist, seed, D, n)
  R←
  
  seed//initialization
  for i=1 to min(n, |D|) do
  d←
  
  argmax_d_iε
  
  D {dist(d_i,R)}
  R←
  
  R∪
  
  {d};
  
  D←
  
  D\{d}where dist is a distance metric, seed—
  
  seed story, D—
  
  a set of relevant updates, d—
  
  document, n—
  
  desired number of updates to select and R—
  
  list of articles ordered by novelty.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 15. The method of claim 14, farther comprising inferring differences between document groups by building a model for each group, and then comparing the models using a similarity metric.
  - 16. The method of claim 15, the models employ smoothed probability distributions over word features or as vectors of weights in the same feature space.
  - 17. The method of claim 15, the similarity metric farther comprising at least one of a KL divergence, a JS divergence, a cosine of vectors computation, a cosine of vectors of feature weights, and a measure of density of previously unseen named entities.
  - 18. The method of claim 17, further comprising providing a novelty ranking algorithm that is applied iteratively to produce a small set of articles that a reader is potentially interested in.
  - 19. The method of claim 18, further comprising employing a greedy incremental analysis and comparing available updates to a seed story that a user has read, selecting the article least similar to the seed story.
  - 20. The method of claim 19, further comprising providing a general analysis of benefits versus the costs of alerting users to balance the informational value of particular articles or groups of articles with the cost of interrupting users, based on a consideration of the user'"'"'s context.
  - 21. The method of claim 19, further comprising comparing articles received in one period with a union of articles received periodically.
  - 22. The method of claim 21, further comprising determining distance metrics that consider previous articles relevant to a topic but decay the metrics weight with age.
  - 23. The method of claim 19, further comprising the following algorithm:
    - Algorithm PICKDAILYUPDATE (dist, Bg, D, thresh)
      d←
      
      arg max_d_iε
      
      D{dist(d_i,Bg)}
      if dist(d. Bg)>
      
      thresh then display(d)
      Bg←
      
      Dwhere dist is a distance metric, Bg—
      
      a background reference set including a union of relevant articles received on a preceding day, D—
      
      set of new articles received today, d—
      
      document and thresh—
      
      user-defined sensitivity threshold.
  - 24. The method of claim 19, further comprising determining a burst of novelty.
  - 25. The method of claim 24, further comprising determining a median filter of that sorts w data points within a window centered on a current point.
  - 26. The method of claim 25, further comprising the following algorithm:
    - $Algorithm IDENTIFYBREAKINGNEWS (dist; D; l; fw; thresh)$ $Window \leftarrow ⋃_{i = 1}^{l} d_{i} \in D$ $for i = l + 1 to \langle D \rangle do$ ${Scores}_{i} \leftarrow dist (d_{i}, Window)$ $Window \leftarrow (Window \ d_{i - l}) ⋃ d_{i}$ ${Scores}^{filt} \leftarrow MedianFilter (Scores, fw)$ $for j = 1 to \langle {Scores}^{filt} \rangle do$ $if {Scores}_{j}^{filt} > thresh then$ $display (d_{j + l})$ $skip to the beginning of the next burst$ where dist is a distance metric, D—
      
      a sequence of relevant articles, d—
      
      document, l—
      
      sliding window length, fw—
      
      median filter width and thresh—
      
      user-defined sensitivity threshold.
  - 27. The method of claim 19, further comprising determining at least one of recap articles, elaboration articles, offshoot articles, and irrelevant articles.

28. A method for performing a document analysis, comprising:
- constructing a language model for each document in a set of documents;
  
  analyzing the documents based at least upon determining a fixed distance metric;
  
  sliding at least one window over words in the documents, wherein for each document a distance score of the sliding window versus a seed story is calculated and the results are passed through a median filter, the median filter identifies novel information in each; and
  
  at least one of storing or displaying the results,wherein the results are at least one of stored in a computer storage medium or displayed on an output device and the median filter comprises the following algorithm;
  
  Algorithm RANKNEWSBYNOVELTY (dist, seed, D, n)
  R←
  
  seed//initialization
  for i=1 to min(n, |D|) do
  d←
  
  argmax_d_iε
  
  D {dist(d_i,R)}
  R←
  
  R∪
  
  {d};
  
  D←
  
  D\{d}where dist is a distance metric, seed—
  
  seed story, D—
  
  a set of relevant updates, d—
  
  document, n—
  
  desired number of updates to select and R—
  
  list of articles ordered by novelty.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 29. The method of claim 28, farther comprising plotting distance scores of the window versus a seed story.
  - 30. The method of claim 29, farther comprising determining a sum of point-wise scores of each word vs. the seed story as stipulated by comparing the language model of a current document with that of the seed story using a selected metric.
  - 31. The method of claim 30, farther comprising employing a window length parameter of 20.
  - 32. The method of claim 28, farther comprising assisting a design of ideal reading sequences or paths through currently unread news stories on a topic, within different time-horizons of recency from present time.
  - 33. The method of claim 28, further comprising designing sequences for catching up on news, considering the most recent news as well as news bursts over time, to help people understand the evolution of news story and navigate the history of stories by major events or updates.
  - 34. The method of claim 28, further comprising developing different types of display designs and metaphors.
  - 35. The method of claim 34, the types include use of a time-line view or clusters in time.
  - 36. The method of claim 28, further comprising providing ideal alerting in a desktop and or mobile setting of breaking news stories within a topic.
  - 37. The method of claim 36, further comprising allowing users to specify topics, or key words and alerting the user when there is enough novelty given what the user has read.
  - 38. The method of claim 36, further comprising alerting a user when a news story appears with keywords if the information novelty is above a predetermined threshold of novelty.

39. A machine implemented system for creating personalized information, comprising:
- means for analyzing a plurality of documents from different information sources;
  
  means for determining a similarity of the documents;
  
  means for providing a personalized feed of novel information based on determined differences in similarity of the documents by implementing the following algorithm; and
  
  means for at least one of storing or displaying the personalized feed,wherein the personalized feed of information is at least one of stored in a computer storage medium or displayed on an output device and the algorithm implemented is;
  
  Algorithm RANKNEWSBYNOVELTY (dist, seed, D, n)
  R←
  
  seed//initialization
  for i=1 to min(n, |D|) do
  d←
  
  argmax_d_iε
  
  D {dist(d_i,R)}
  R←
  
  R∪
  
  {d};
  
  D←
  
  D\{d}where dist is a distance metric, seed—
  
  seed story, D—
  
  a set of relevant updates, d—
  
  document, n—
  
  desired number of updates to select and R—
  
  list of articles ordered by novelty.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Dumais, Susan T., Horvitz, Eric J., Gabrilovich, Evgeniy
Primary Examiner(s)
Lu; Kuen S.

Application Number

US10/827,729
Publication Number

US 20050198056A1
Time in Patent Office

1,295 Days
Field of Search

707/4, 707/5, 707/6, 707/101, 704/1, 704/4
US Class Current

707/754
CPC Class Codes

G06F 16/9535   Search customisation based ...

G06F 16/9536   Search customisation based ...

G06F 16/9538   Presentation of query results

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Y10S 707/99942   Manipulating data structure...

Principles and methods for personalizing newsfeeds via an analysis of information novelty and dynamics

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

222 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Principles and methods for personalizing newsfeeds via an analysis of information novelty and dynamics

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

222 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links