Labeling events in historic news

US 8,838,604 B1
Filed: 09/14/2012
Issued: 09/16/2014
Est. Priority Date: 09/30/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

determining, by one or more processors, a plurality of documents relating to a query,the plurality of documents being associated with a plurality of timestamps;

determining, by the one or more processors, a first group of documents, of the plurality of documents, associated with a first timestamp of the plurality of timestamps;

determining, by the one or more processors, a particular quantity of times that one or more forms of the query are included in the first group of documents;

identifying, by the one or more processors, a first document of the first group of documents;

labeling, by the one or more processors, a first point on a graph by using a first headline associated with the first document,the first point corresponding to the first timestamp and a value based on the particular quantity of times, andthe value based on the particular quantity of times satisfying a particular threshold;

determining, by the one or more processors, a second group of documents, of the plurality of documents, associated with a second timestamp of the plurality of timestamps;

identifying, by the one or more processors, a second document of the second group of documents;

labeling, by the one or more processors, a second point on the graph by using a second headline associated with the second document,the second point corresponding to the second timestamp,the graph including a plurality of points,the plurality of points including the first point, the second point, and two or more other points, andthe two or more other points being below the first point and the second point on the graph; and

providing, by the one or more processors, the graph as a response to the query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system identifies a set of documents from a corpus of documents that are relevant to a word, phrase or sentence and that were published at approximately a same time period, where each document of the set of documents includes news content and has an associated headline. The system extracts headlines from the set of documents and derives a score for each headline of the extracted headlines based on how many times selected words in each headline occurs among all of the extracted headlines.

20 Citations

View as Search Results

20 Claims

1. A method comprising:
- determining, by one or more processors, a plurality of documents relating to a query,the plurality of documents being associated with a plurality of timestamps;
  
  determining, by the one or more processors, a first group of documents, of the plurality of documents, associated with a first timestamp of the plurality of timestamps;
  
  determining, by the one or more processors, a particular quantity of times that one or more forms of the query are included in the first group of documents;
  
  identifying, by the one or more processors, a first document of the first group of documents;
  
  labeling, by the one or more processors, a first point on a graph by using a first headline associated with the first document,the first point corresponding to the first timestamp and a value based on the particular quantity of times, andthe value based on the particular quantity of times satisfying a particular threshold;
  
  determining, by the one or more processors, a second group of documents, of the plurality of documents, associated with a second timestamp of the plurality of timestamps;
  
  identifying, by the one or more processors, a second document of the second group of documents;
  
  labeling, by the one or more processors, a second point on the graph by using a second headline associated with the second document,the second point corresponding to the second timestamp,the graph including a plurality of points,the plurality of points including the first point, the second point, and two or more other points, andthe two or more other points being below the first point and the second point on the graph; and
  
  providing, by the one or more processors, the graph as a response to the query.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, where identifying the first document includes:
    - extracting headlines from the first group of documents to obtain a group of headlines,the group of headlines including the first headline,determining scores for the first group of documents based on the group of headlines, andidentifying the first document based on the scores determined for the first group of documents.
  - 3. The method of claim 2, where determining the scores includes:
    - determining, for the first headline, a quantity of times that a word included in the first headline occurs in the group of headlines, anddetermining a score, of the scores, for the first document based on the quantity of times that the word included in the first headline occurs in the group of headlines.
  - 4. The method of claim 1,where determining the particular quantity of times that the one or more forms of the query are included in the first group of documents includes:
    - determining the particular quantity of times that a word included in the query is included in the first group of documents, andwhere identifying the first document includes;
      
      determining that a quantity of times that the word is included in the first document is greater than a quantity of times that the word is included in other documents of the first group of documents, andidentifying the first document based on the quantity of times that the word is included in the first document being greater than the quantity of times that the word is included in the other documents.
  - 5. The method of claim 1, where identifying the first document includes:
    - determining that a quantity of times that the one or more forms of the query are included in the first headline associated with the first document is greater than a quantity of times that the one or more forms of the query are included in other headlines associated with other documents of the first group of documents, andidentifying the first document based on the quantity of times that the one or more forms of the query are included in the first headline associated with the first document being greater than the quantity of times that the one or more forms of the query are included in the other headlines associated with the other documents.
  - 6. The method of claim 1, where the one or more forms of the query comprise one or more of:
    - a word included in the query,a phrase included in the query, ora sentence included in the query.
  - 7. The method of claim 1,where the method further comprises:
    - determining another quantity of times that the one or more forms of the query are included in second headlines of the second group of documents; and
      
      identifying the second headline that is associated with the second document based on a quantity of times that the one or more forms of the query are included in the second headline being greater than a quantity of times that the one or more forms of the query are included in any other headline of the second group of documents,where the second point further corresponds to another value based on the other quantity of times, andwhere the other value is based on the other quantity of times satisfying the particular threshold.

8. A system comprising:
- one or more processors to;
  
  determine a plurality of documents relating to a query,the plurality of documents being associated with a plurality of timestamps;
  
  determine a first group of documents, of the plurality of documents, associated with a first timestamp of the plurality of timestamps;
  
  determine a particular quantity of times that one or more forms of the query are included in the first group of documents;
  
  identify a first document of the first group of documents;
  
  label a first point on a graph by using a first headline associated with the first document,the first point corresponding to the first timestamp and a value based on the particular quantity of times, andthe value based on the particular quantity of times satisfying a particular threshold;
  
  determine a second group of documents, of the plurality of documents, associated with a second timestamp of the plurality of timestamps;
  
  identify a second document of the second group of documents;
  
  label a second point on the graph by using a second headline associated with the second document,the second point corresponding to the second timestamp,the graph including a plurality of points,the plurality of points including the first point, the second point, and two or more other points, andthe two or more other points being below the first point and the second point on the graph; and
  
  provide the graph as a response to the query.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, where, when identifying the first document, the one or more processors are to:
    - read headlines from the first group of documents to obtain a group of headlines,the group of headlines including the first headline,determine scores for the group of headlines, andidentify the first document based on the scores determined for the group of headlines.
  - 10. The system of claim 9, where, when determining the scores, the one or more processors are to:
    - determine, for the first headline, a quantity of times that a word included in the first headline occurs in the group of headlines, anddetermine a score, of the scores, for the first headline based on the quantity of times that the word included in the first headline occurs in the group of headlines.
  - 11. The system of claim 8,where, when determining the particular quantity of times that the one or more forms of the query are included in the first group of documents, the one or more processors are to:
    - determine the particular quantity of times that a word included in the query is included in the first group of documents; and
      
      where, when identifying the first document, the one or more processors are to;
      
      determine that a quantity of times that the word is included in the first document is greater than a quantity of times that the word is included in other documents of the first group of documents, andidentify the first document based on the quantity of times that the word is included in the first document being greater than the quantity of times that the word is included in the other documents.
  - 12. The system of claim 8, where, when identifying the first document, the one or more processors are to:
    - determine that a quantity of times that the one or more forms of the query are included in the first headline associated with the document is greater than a quantity of times that the one or more forms of the query are included in other headlines associated with other documents of the first group of documents, andidentify the first document based on the quantity of times that the one or more forms of the query are included in the first headline associated with the document being greater than the quantity of times that the one or more forms of the query are included in the other headlines associated with the other documents.
  - 13. The system of claim 8, where the one or more forms of the query comprise one or more of:
    - a word included in the query,a phrase included in the query, ora sentence included in the query.
  - 14. The system of claim 8,where the one or more processors are to:
    - determine another quantity of times that the one or more forms of the query are included in the second group of documents, andwhere the second point on the graph further corresponds to another value based on the other quantity of times.

15. A non-transitory computer-readable medium storing instructions, the instructions comprising:
- one or more instructions that, when executed by one or more processors, cause the one or more processors to;
  
  determine a plurality of documents relating to a query,the plurality of documents being associated with a plurality of timestamps;
  
  determine a first group of documents, of the plurality of documents, associated with a first timestamp of the plurality of timestamps;
  
  determine a particular quantity of times that one or more forms of the query are included in the first group of documents;
  
  identify a first document of the first group of documents;
  
  label a first point on a graph by using a first headline associated with the first document,the first point corresponding to the first timestamp and a value based on the particular quantity of times, andthe value based on the particular quantity of times satisfying a particular threshold;
  
  determine a second group of documents, of the plurality of documents, associated with a second timestamp of the plurality of timestamps;
  
  identify a second document of the second group of documents;
  
  label a second point on the graph by using a second headline associated with the second document,the second point corresponding to the second timestamp,the graph including a plurality of points,the plurality of points including the first point, the second point, and two or more other points, andthe two or more other points being below the first point and the second point on the graph; and
  
  provide the graph as a response to the query.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable medium of claim 15, where the one or more instructions to identify the first document include:
    - one or more instructions that, when executed by the one or more processors, cause the one or more processors to;
      
      identify headlines from the first group of documents to obtain a group of headlines,the group of headlines including the first headline,determine scores for the group of headlines, andidentify the first document based on the scores determined for the group of headlines.
  - 17. The non-transitory computer-readable medium of claim 16, where the one or more instructions to determine the scores include:
    - one or more instructions that, when executed by the one or more processors, cause the one or more processors to;
      
      determine, for the first headline, a quantity of times that a word included in the first headline occurs in the group of headlines, anddetermine a score, of the scores, for the first headline based on the quantity of times that the word included in the first headline occurs in the group of headlines.
  - 18. The non-transitory computer-readable medium of claim 15,where the one or more instructions to determine the particular quantity of times that the one or more forms of the query are included in the first group of documents include:
    - one or more instructions that, when executed by the one or more processors, cause the one or more processors to;
      
      determine the particular quantity of times that a word included in the query is included in the first group of documents, andwhere the one or more instructions to identify the first document include;
      
      one or more instructions that, when executed by the one or more processors, cause the one or more processors to;
      
      determine that a quantity of times that the word is included in the first document is greater than a quantity of times that the word is included in other documents of the first group of documents, andidentify the first document based on the quantity of times that the word is included in the first document being greater than the quantity of times that the word is included in the other documents.
  - 19. The non-transitory computer-readable medium of claim 15, where the one or more instructions to identify the first document include:
    - one or more instructions that, when executed by the one or more processors, cause the one or more processors to;
      
      determine that a quantity of times that the one or more forms of the query are included in the first headline associated with the first document is greater than a quantity of times that the one or more forms of the query are included in other headlines associated with other documents of the first group of documents, andidentify the first document based on the quantity of times that the one or more forms of the query are included in the first headline associated with the first document being greater than the quantity of times that the one or more forms of the query are included in the other headlines associated with the other documents.
  - 20. The non-transitory computer-readable medium of claim 15,where the one or more instructions to label the second point on the graph comprise:
    - one or more instructions that, when executed by the one or more processors, cause the one or more processors to;
      
      determine another quantity of times that the one or more forms of the query are included in the second group of documents; and
      
      label, based on the other quantity of times, the second point on the graph by using the second headline associated with the second document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Osinga, Douwe
Primary Examiner(s)
LEWIS, ALICIA M

Application Number

US13/615,922
Time in Patent Office

732 Days
Field of Search

707/737, 707/750, 707/758
US Class Current

707/737
CPC Class Codes

G06F 16/958 Organisation or management ...

Labeling events in historic news

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Labeling events in historic news

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links