MEDIA MESSAGE CREATION WITH AUTOMATIC TITLING

US 20200137349A1
Filed: 12/26/2019
Published: 04/30/2020
Est. Priority Date: 03/19/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data;

in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and

presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data,wherein the method is performed by a computing device comprising one or more processors.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, a user device can be configured to create media messages with automatic titling. For example, a user can create a media messaging project that includes multiple video clips. The video clips can be generated based on video data and/or audio data captured by the user device and/or based on pre-recorded video data and/or audio data obtained from various storage locations. When the user device captures the audio data for a clip, the user device can obtain a speech-to-text transcription of the audio data in near real time and present the transcription data (e.g., text) overlaid on the video data while the video data is being captured or presented by the user device.

0 Citations

20 Claims

1. A method comprising:
- receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data;
  
  in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and
  
  presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data,wherein the method is performed by a computing device comprising one or more processors.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the first video data is obtained from a video stream captured by a camera, the video stream including the video data captured by the camera to generate the media message.
  - 3. The method of claim 1, wherein the transcription data is derived from an audio stream that includes speech captured by a microphone.
  - 4. The method of claim 3, further comprising:
    - persistently storing the audio stream in association with the media message in response to voiceovers being enabled; and
      
      deleting the audio stream after the transcription data is derived therefrom in response to voiceovers being disabled.
  - 5. The method of claim 1, wherein the first transcription data is presented to the display according to a titling style designated for the first clip.
  - 6. The method of claim 1, wherein the timing data for each respective token includes a time offset and a duration, and further comprising:
    - presenting a particular token in the first clip according to the time offset and the duration associated with the particular token.
  - 7. The method of claim 6, further comprising:
    - presenting first text of the first transcription data corresponding to the particular token for a period of time corresponding to the duration associated with the particular token.

8. A computing device comprising:
- one or more processors; and
  
  a non-transitory computer-readable medium including instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data;
  
  in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and
  
  presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computing device of claim 8, wherein the first video data is obtained from a video stream captured by a camera, the video stream including the video data captured by the camera to generate the media message.
  - 10. The computing device of claim 8, wherein the transcription data is derived from an audio stream that includes speech captured by a microphone.
  - 11. The computing device of claim 10, wherein the operations further comprise:
    - persistently storing the audio stream in association with the media message in response to voiceovers being enabled; and
      
      deleting the audio stream after the transcription data is derived therefrom in response to voiceovers being disabled.
  - 12. The computing device of claim 8, wherein the first transcription data is presented to the display according to a titling style designated for the first clip.
  - 13. The computing device of claim 8, wherein the timing data for each respective token includes a time offset and a duration, and wherein the operations further comprise:
    - presenting a particular token in the first clip according to the time offset and the duration associated with the particular token.
  - 14. The computing device of claim 13, wherein the operations further comprise:
    - presenting first text of the first transcription data corresponding to the particular token for a period of time corresponding to the duration associated with the particular token.

15. A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
- receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data;
  
  in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and
  
  presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable medium of claim 15, wherein the first video data is obtained from a video stream captured by a camera, the video stream including the video data captured by the camera to generate the media message, and wherein the transcription data is derived from an audio stream that includes speech captured by a microphone.
  - 17. The non-transitory computer-readable medium of claim 16, wherein the operations further comprise:
    - persistently storing the audio stream in association with the media message in response to voiceovers being enabled; and
      
      deleting the audio stream after the transcription data is derived therefrom in response to voiceovers being disabled.
  - 18. The non-transitory computer-readable medium of claim 15, wherein the first transcription data is presented to the display according to a titling style designated for the first clip.
  - 19. The non-transitory computer-readable medium of claim 15, wherein the timing data for each respective token includes a time offset and a duration, and wherein the operations further comprise:
    - presenting a particular token in the first clip according to the time offset and the duration associated with the particular token.
  - 20. The non-transitory computer-readable medium of claim 19, wherein the operations further comprise:
    - presenting first text of the first transcription data corresponding to the particular token for a period of time corresponding to the duration associated with the particular token.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Black, David, Harding, Andrew L., Weil, Joseph-Alexander P., Brasure, James, Berkeley, Joash S., Ernst, Katherine K., Salvador, Richard, Sheeler, Stephen, Cummings, William D., Wang, Xiaohuan Corina, Clark, Robert L., O'Neil, Kevin M.

Granted Patent

US 11,178,356 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

H04N 21/42203   sound input device, e.g. mi...

H04N 21/4223   Cameras H04N23/00 takes pre...

H04N 21/43072   of multiple content streams...

H04N 21/4334   Recording operations record...

H04N 21/4394   involving operations for an...

H04N 21/4398   involving reformatting oper...

H04N 21/440236   by media transcoding, e.g. ...

H04N 21/472   End-user interface for requ...

H04N 21/4858   for modifying screen layout...

H04N 5/76   Television signal recording

H04N 5/9202   the additional signal being...

H04N 5/9207   for teletext

H04N 9/8211   the additional signal being...

H04N 9/8233   the additional signal being...

MEDIA MESSAGE CREATION WITH AUTOMATIC TITLING

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

0 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

MEDIA MESSAGE CREATION WITH AUTOMATIC TITLING

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

0 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links