Automated closed captioning using temporal data

US 9,922,095 B2
Filed: 06/02/2015
Issued: 03/20/2018
Est. Priority Date: 06/02/2015
Status: Active Grant

First Claim

Patent Images

1. A system for increasing accuracy of computer speech recognition comprising:

a dynamic grammar builder computing device comprising one or more processing units and one or more computer-readable media comprising computer-executable instructions which, when executed by the one or more processing units, cause the dynamic grammar builder computing device to;

obtain social network data occurring within a threshold timespan of a broadcast of media content;

identify named entities from the obtained social network data that are trending within the obtained social network data;

rank the identified named entities based upon the trending; and

build a dynamic grammar comprising at least some of the named entities based upon the ranking; and

a speech recognition computing device comprising one or more processing units and one or more computer-readable media comprising computer-executable instructions which, when executed by the one or more processing units, cause the speech recognition computing device to;

perform speech recognition of spoken words, spoken by the broadcast of the media content, utilizing the dynamic grammar to create closed caption text for the broadcast of the media content.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One or more systems and/or techniques are provided for automatic closed captioning for media content. In an example, real-time content, occurring within a threshold timespan of a broadcast of media content (e.g., social network posts occurring during and an hour before a live broadcast of an interview), may be accessed. A list of named entities, occurring within the social network data, may be generated (e.g., Interviewer Jon, Interviewee Kathy, Husband Dave, Son Jack, etc.). A ranked list of named entities may be created based upon trending named entities within the list of named entities (e.g., a named entity may be ranked higher based upon a more frequent occurrence within the social network posts). A dynamic grammar (e.g., library, etc.) may be built based upon the ranked list of named entities. Speech recognition may be performed upon the broadcast of media content utilizing the dynamic grammar to create closed caption text.

Citations

20 Claims

1. A system for increasing accuracy of computer speech recognition comprising:
- a dynamic grammar builder computing device comprising one or more processing units and one or more computer-readable media comprising computer-executable instructions which, when executed by the one or more processing units, cause the dynamic grammar builder computing device to;
  
  obtain social network data occurring within a threshold timespan of a broadcast of media content;
  
  identify named entities from the obtained social network data that are trending within the obtained social network data;
  
  rank the identified named entities based upon the trending; and
  
  build a dynamic grammar comprising at least some of the named entities based upon the ranking; and
  
  a speech recognition computing device comprising one or more processing units and one or more computer-readable media comprising computer-executable instructions which, when executed by the one or more processing units, cause the speech recognition computing device to;
  
  perform speech recognition of spoken words, spoken by the broadcast of the media content, utilizing the dynamic grammar to create closed caption text for the broadcast of the media content.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the computer-readable media of the dynamic grammar builder computing device comprise additional computer-executable instructions which, when executed by the one or more processing units of the dynamic grammar builder computing device cause the dynamic grammar builder computing device to:
    - access supplemental content associated with at least one of the named entities;
      
      identify a context associated with the at least one of the named entities or with an event occurring within the media content; and
      
      build the dynamic grammar based upon the identified context.
  - 3. The system of claim 1, wherein the trending comprises occurrence metrics identifying a number of times a particular named entity is referenced in the obtained social network data within trending timespan thresholds.
  - 4. The system of claim 1, wherein the identifying the named entities comprises executing entity recognition functionality upon the social network data.
  - 5. The system of claim 1, wherein the ranking comprises responsive to a refresh timer expiring, re-ranking the named entities based upon the trending within updated social network data.
  - 6. The system of claim 1, wherein the computer-readable media of the dynamic grammar builder computing device comprise additional computer-executable instructions which, when executed by the one or more processing units of the dynamic grammar builder computing device cause the dynamic grammar builder computing device to:
    - define the threshold timespan based upon a type of event occurring within the media content.

7. A computing device comprising:
- one or more processing units; and
  
  one or more computer-readable media comprising computer-executable instructions which, when executed by the one or more processing units, cause the computing device to;
  
  obtain real-time content occurring within a threshold timespan of a broadcast of media content;
  
  identify named entities from the obtained real-time content that are trending within the obtained real-time content;
  
  ranking the identified named entities based upon the trending;
  
  building a dynamic grammar comprising at least some of the named entities based upon the ranking; and
  
  utilizing the dynamic grammar for correcting user generated closed captioning for the broadcast of the media content.

8. A method for increasing accuracy of computer speech recognition comprising:
- obtaining, by a computing device, social network data occurring within a threshold timespan of a broadcast of media content;
  
  identifying, on the computing device, named entities from the obtained social network data that are trending within the obtained social network data;
  
  ranking, on the computing device, the identified named entities based upon the trending;
  
  building, on the computing device, a dynamic grammar comprising at least some of the named entities based upon the ranking; and
  
  performing computer speech recognition of spoken words, spoken by the broadcast of the media content, utilizing the dynamic grammar to create closed caption text for the broadcast of the media content.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 9. The method of claim 8, further comprising:
    - accessing, by the computing device, supplemental content associated with at least one of the named entities;
      
      identifying, with the computing device, a context associated with either the at least one of the named entities or with an event occurring within the media content; and
      
      building the dynamic grammar based upon the identified context.
  - 10. The method of claim 8, wherein the trending comprises occurrence metrics identifying a number of times a particular named entity is referenced in the obtained social network data within trending timespan thresholds.
  - 11. The method of claim 8, wherein the identifying the named entities comprises:
    - executing entity recognition functionality upon the social network data.
  - 12. The method of claim 8, wherein the ranking comprises:
    - responsive to a refresh timer expiring, re-ranking the named entities based upon the trending within updated social network data.
  - 13. The method of claim 8, further comprising:
    - defining the threshold timespan based upon a type of event occurring within the media content.
  - 14. The method of claim 8, wherein the media content is one of a live interview, a live telecast, or an online videogame.
  - 15. The method of claim 8, wherein the dynamic grammar is:
    - a speech recognition grammar specification file.
  - 16. The method of claim 8, the social network data comprises at least one of a query log, a message, a microblog, a forum post, or user created data that is updated in real-time.
  - 17. The method of claim 8, wherein the performing the speech recognition comprises:
    - loading the dynamic grammar into memory of a speech recognition server; and
      
      invoking speech recognition functionality, of the speech recognition server, to utilize the dynamic grammar as a library against which an audio fragment, of the broadcast of the media content, is compared for determining a probability that the audio fragment is a named entity.
  - 18. The method of claim 8, further comprising:
    - correcting user generated closed captioning for the broadcast of the media content based upon the dynamic grammar.
  - 19. The method of claim 8, wherein the performing speech recognition comprises:
    - utilizing the dynamic grammar to either increase, or decrease a probability that an audio fragment, of the broadcast of the media content, is a named entity.
  - 20. The method of claim 8, further comprising:
    - filtering the social network data based upon a context of an event occurring within the media content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Koul, Anirudh, Kulkarni, Ranjitha Gurunath, Tremblay, Serge-Eric
Primary Examiner(s)
Le, Uyen

Application Number

US14/728,201
Publication Number

US 20160357746A1
Time in Patent Office

1,022 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/9535   Search customisation based ...

G06F 40/295   Named entity recognition

G10L 15/183   using context dependencies,...

G10L 15/26   Speech to text systems G10L...

H04N 21/4888   for displaying teletext cha...

Automated closed captioning using temporal data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automated closed captioning using temporal data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links