MEDIA MESSAGE CREATION WITH AUTOMATIC TITLING
First Claim
1. A method comprising:
- receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data;
in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and
presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data,wherein the method is performed by a computing device comprising one or more processors.
0 Assignments
0 Petitions
Accused Products
Abstract
In some implementations, a user device can be configured to create media messages with automatic titling. For example, a user can create a media messaging project that includes multiple video clips. The video clips can be generated based on video data and/or audio data captured by the user device and/or based on pre-recorded video data and/or audio data obtained from various storage locations. When the user device captures the audio data for a clip, the user device can obtain a speech-to-text transcription of the audio data in near real time and present the transcription data (e.g., text) overlaid on the video data while the video data is being captured or presented by the user device.
0 Citations
20 Claims
-
1. A method comprising:
-
receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data; in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data, wherein the method is performed by a computing device comprising one or more processors. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computing device comprising:
-
one or more processors; and a non-transitory computer-readable medium including instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data; in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
receiving a user input initiating playback of a media message, the media message including a sequence of clips that include video data and transcription data; in response to the user input, obtaining a first video data and a first transcription data associated with a first clip in the sequence of clips, wherein the first transcription data includes a plurality of first tokens, each first token having respective timing data that indicates when, during a recording of the first video data, a spoken word corresponding to the first token was obtained; and presenting, on a display, the first video data and the first transcription data according to timing data for each respective first token in the first transcription data such that presentation of the first transcription data imitates a cadence of corresponding spoken words when they were captured during recording of the first video data. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification