Method and apparatus for annotating video content with metadata generated using speech recognition technology

US 8,793,583 B2
Filed: 10/17/2012
Issued: 07/29/2014
Est. Priority Date: 05/11/2007
Status: Active Grant

First Claim

Patent Images

1. A method for annotation of video content in a device communicatively coupled to a network, the method comprising:

receiving, in the device, a captured speech segment comprising speech from a user of a second device, wherein the captured speech segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user;

converting the captured speech segment to a text-segment;

associating the text-segment with the portion of the video content contemporaneously played to the user; and

storing in a selectively retrievable manner the text-segment so that the text-segment is associated with the portion of the video content.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content.

38 Citations

View as Search Results

22 Claims

1. A method for annotation of video content in a device communicatively coupled to a network, the method comprising:
- receiving, in the device, a captured speech segment comprising speech from a user of a second device, wherein the captured speech segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user;
  
  converting the captured speech segment to a text-segment;
  
  associating the text-segment with the portion of the video content contemporaneously played to the user; and
  
  storing in a selectively retrievable manner the text-segment so that the text-segment is associated with the portion of the video content.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 3. The method of claim 1 further comprising:
    - streaming the video content via the network to the second device.
  - 4. The method of claim 1 further comprising:
    - receiving, in the device, a timestamp associated with the captured speech segment;
      
      wherein associating the text-segment with the portion of the video content further comprises using the timestamp.
  - 5. The method of claim 1 further comprising:
    - generating metadata based on the text-segment.
  - 6. The method of claim 1 further comprising:
    - generating metadata based on an identified speaker associated with the speech segment.
  - 7. The method of claim 1 further comprising:
    - generating metadata based on specific words of the text-segment.
  - 8. The method of claim 1 further comprising:
    - receiving, in the device, before receiving the captured speech segment, a message comprising a user input selecting an operational state.
  - 9. The method of claim 1 wherein the operational state is selected from the group consisting of an annotate state, a narrate state, a commentary state, an analyze state, and a review/edit state.
  - 10. The method of claim 1 wherein storing the text-segment further comprises:
    - storing the text-segment and metadata in a database of a storage device communicatively coupled to the network.
  - 11. The method of claim 1 wherein storing the text-segment further comprises:
    - storing the text-segment in a database of a storage device communicatively coupled to the network.
  - 12. The method of claim 1 further comprising:
    - storing, in a database of a storage device communicatively coupled to the network, metadata comprising a timestamp for associating the text-segment with the portion of the video content.
  - 13. The method of claim 1 further comprising:
    - storing, in a storage device communicatively coupled to the network, a modified version of the video content comprising metadata for associating the text-segment with the portion of the video content.
  - 14. The method of claim 1 wherein storing the text-segment further comprises:
    - storing, in a storage device communicatively coupled to the network, a modified version of the video content comprising the text-segment and metadata for associating the text-segment with the portion of the video content.

2. An apparatus for annotation of a video content, the apparatus comprising:
- a memory; and
  
  a processor communicatively coupled to the memory and to a network interface,the processor configured to be communicatively coupled via the network interface to a network;
  
  the processor further configured to receive, via the network interface, a captured speech segment comprising speech from a user of a second device coupled to the network, wherein the captured speech segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user;
  
  the processor further configured to convert the captured speech segment to a text-segment, to associate the text-segment with the portion of the video content contemporaneously played to the user; and
  
  to store in a selectively retrievable manner the text-segment so that the text-segment is associated with the portion of the video content.

15. A method for annotation of video content in a device communicatively coupled to a network, the method comprising:
- receiving, in the device, a text-segment of recognized speech comprising recognized speech from a user of a second device coupled to the network, wherein the text-segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user;
  
  associating the text-segment with the portion of the video content; and
  
  storing in a selectively retrievable manner the text-segment so that it is associated with the portion of the video content.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15 further comprising:
    - streaming the video content via the network to the second device.
  - 17. The method of claim 15 further comprising:
    - receiving, in the device, a timestamp associated with the captured speech segment;
      
      wherein associating the text-segment with the portion of the video content further comprises using the timestamp.
  - 18. A non-transitory computer-readable medium having computer-executable instructions embodied thereon for annotation of video content in a device communicatively coupled to a network, wherein the instructions, when executed by at least one processor of the device, cause the at least one processor to perform the method of claim 15.

19. A method for annotation of video content in a device communicatively coupled to a network, the method comprising:
- receiving, in the device, a text-segment of recognized speech comprising recognized speech from a user of a second device, wherein the text-segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user;
  
  receiving, in the device, metadata comprising a timestamp for associating the text-segment with the portion of the video content; and
  
  storing in a selectively retrievable manner the text-segment so that it is associated with the portion of the video content.
- View Dependent Claims (20, 21, 22)
- - 20. The method of claim 19 further comprising:
    - streaming the video content via the network to the second device.
  - 21. The method of claim 19 further comprising:
    - receiving, in the device, a timestamp associated with the captured speech segment;
      
      wherein associating the text-segment with the portion of the video content further comprises using the timestamp.
  - 22. A non-transitory computer-readable medium having computer-executable instructions embodied thereon for annotation of video content in a device communicatively coupled to a network, wherein the instructions, when executed by at least one processor of the device, cause the at least one processor to perform the method of claim 19.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola Mobility LLC (Lenovo Group Ltd.)
Inventors
McKoen, Kevin M., Grossman, Michael A.
Primary Examiner(s)
Chawan, Vijay B

Application Number

US13/654,327
Publication Number

US 20130041664A1
Time in Patent Office

650 Days
Field of Search

715/266, 715/728, 715/727, 715/202, 715/201, 715/723, 369/28.01, 369/4, 369/27.01, 369/47.16, 709/209, 707/104.1, 707/14.61, 707/E17.019, 702/189, 386/38, 386/39, 386/E9.036, 386/E9.013, 348/E7.071, 704/235, 704/251
US Class Current

715/728
CPC Class Codes

G06F 16/70   of video data

G06F 16/78   Retrieval characterised by ...

G06F 16/7844   using original textual cont...

G06F 40/169   Annotation, e.g. comment da...

G10L 15/26   Speech to text systems G10L...

Method and apparatus for annotating video content with metadata generated using speech recognition technology

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

38 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for annotating video content with metadata generated using speech recognition technology

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links