METHOD AND APPARATUS FOR ANNOTATING VIDEO CONTENT WITH METADATA GENERATED USING SPEECH RECOGNITION TECHNOLOGY

US 20140331137A1
Filed: 07/21/2014
Published: 11/06/2014
Est. Priority Date: 05/11/2007
Status: Abandoned Application

First Claim

Patent Images

1. (canceled)

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content.

31 Citations

View as Search Results

21 Claims

1. (canceled)
- View Dependent Claims (3, 4, 5, 6, 7, 8)
- - 3. The method of claim 1, wherein updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises:
    - updating the header of the frame to include a sequence header, a GOP header, user data, and an I-frame header.
  - 4. The method of claim 1, wherein:
    - providing a particular portion of a video for output comprises;
      
      providing, by a set-top box, the particular portion of the video for output,receiving an utterance while the particular portion of the video is being provided for output comprises;
      
      receiving, by the set-top box, the utterance while the particular portion of the video is being provided for output,obtaining, from an automated speech recognizer, a transcription of the utterance comprises;
      
      obtaining, from the automated speech recognizer on the set-top box, the transcription of the utterance,generating video metadata based on the transcription comprises;
      
      generating, by the set-top box, the video metadata based on the transcription, andupdating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises;
      
      updating, by the set-top box, the header of the frame of data that corresponds to the particular portion of the video, to include the video metadata.
  - 5. The method of claim 1, wherein generating video metadata based on the transcription comprises:
    - generating the video metadata based on a particular video standard.
  - 6. The method of claim 1, wherein updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises:
    - updating the header of the frame of data to include the video metadata and an associated time-stamp as user data bits.
  - 7. The method of claim 1, comprising:
    - generating a caption or a subtitle based on the video metadata; and
      
      storing the caption or the subtitle for display with the particular portion of the video.
  - 8. The method of claim 1, comprising:
    - providing a predetermined user prompt,wherein the utterance is received in response to the predetermined user prompt.

2. A computer-implemented method comprising:
- providing a particular portion of a video for output;
  
  receiving an utterance while the particular portion of the video is being provided for output;
  
  obtaining, from an automated speech recognizer, a transcription of the utterance;
  
  generating video metadata based on the transcription; and
  
  updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata.

9. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  providing a particular portion of a video for output;
  
  receiving an utterance while the particular portion of the video is being provided for output;
  
  obtaining, from an automated speech recognizer, a transcription of the utterance;
  
  generating video metadata based on the transcription; and
  
  updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, wherein updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises:
    - updating the header of the frame to include a sequence header, a GOP header, user data, and an I-frame header.
  - 11. The system of claim 9, wherein:
    - providing a particular portion of a video for output comprises;
      
      providing, by a set-top box, the particular portion of the video for output,receiving an utterance while the particular portion of the video is being provided for output comprises;
      
      receiving, by the set-top box, the utterance while the particular portion of the video is being provided for output,obtaining, from an automated speech recognizer, a transcription of the utterance comprises;
      
      obtaining, from the automated speech recognizer on the set-top box, the transcription of the utterance,generating video metadata based on the transcription comprises;
      
      generating, by the set-top box, the video metadata based on the transcription, andupdating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises;
      
      updating, by the set-top box, the header of the frame of data that corresponds to the particular portion of the video, to include the video metadata.
  - 12. The system of claim 9, wherein generating video metadata based on the transcription comprises:
    - generating the video metadata based on a particular video standard.
  - 13. The system of claim 9, wherein updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises:
    - updating the header of the frame of data to include the video metadata and an associated time-stamp as user data bits.
  - 14. The system of claim 9, wherein the operations further comprise:
    - generating a caption or a subtitle based on the video metadata; and
      
      storing the caption or the subtitle for display with the particular portion of the video.
  - 15. The system of claim 9, wherein the operations further comprise:
    - providing a predetermined user prompt,wherein the utterance is received in response to the predetermined user prompt.

16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- providing a particular portion of a video for output;
  
  receiving an utterance while the particular portion of the video is being provided for output;
  
  obtaining, from an automated speech recognizer, a transcription of the utterance;
  
  generating video metadata based on the transcription; and
  
  updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The medium of claim 16, wherein updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises:
    - updating the header of the frame to include a sequence header, a GOP header, user data, and an I-frame header.
  - 18. The medium of claim 16, wherein:
    - providing a particular portion of a video for output comprises;
      
      providing, by a set-top box, the particular portion of the video for output,receiving an utterance while the particular portion of the video is being provided for output comprises;
      
      receiving, by the set-top box, the utterance while the particular portion of the video is being provided for output,obtaining, from an automated speech recognizer, a transcription of the utterance comprises;
      
      obtaining, from the automated speech recognizer on the set-top box, the transcription of the utterance,generating video metadata based on the transcription comprises;
      
      generating, by the set-top box, the video metadata based on the transcription, andupdating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata comprises;
      
      updating, by the set-top box, the header of the frame of data that corresponds to the particular portion of the video, to include the video metadata.
  - 19. The medium of claim 16, wherein generating video metadata based on the transcription comprises:
    - generating the video metadata based on a particular video standard.
  - 20. The medium of claim 16, wherein the operations further comprise:
    - generating a caption or a subtitle based on the video metadata; and
      
      storing the caption or the subtitle for display with the particular portion of the video.
  - 21. The medium of claim 16, wherein the operations further comprise:
    - providing a predetermined user prompt,wherein the utterance is received in response to the predetermined user prompt.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola Mobility LLC (Lenovo Group Ltd.)
Inventors
McKoen, Kevin M., Grossman, Michael A.

Application Number

US14/336,063
Publication Number

US 20140331137A1
Time in Patent Office

Days
Field of Search
US Class Current

715/719
CPC Class Codes

G06F 16/70   of video data

G06F 16/78   Retrieval characterised by ...

G06F 16/7844   using original textual cont...

G06F 40/169   Annotation, e.g. comment da...

G10L 15/26   Speech to text systems G10L...

METHOD AND APPARATUS FOR ANNOTATING VIDEO CONTENT WITH METADATA GENERATED USING SPEECH RECOGNITION TECHNOLOGY

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS FOR ANNOTATING VIDEO CONTENT WITH METADATA GENERATED USING SPEECH RECOGNITION TECHNOLOGY

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links