Media message creation with automatic titling
First Claim
Patent Images
1. A method comprising:
- presenting, by a computing device, a graphical user interface for generating clips associated with a media message on a display of the computing device;
receiving, by the computing device, a first user input selecting a first graphical element for initiating a recording of a first clip, the first graphical element presented on the graphical user interface;
in response to the first user input, receiving, by the computing device, an audio stream from a microphone associated with the computing device, the audio stream including speech captured by the microphone;
in response to the first user input, receiving, by the computing device, a video stream from a camera associated with the computing device, the video stream including video data captured by the camera;
in response to receiving a first portion of the audio stream and while receiving a second portion the audio stream, obtaining, by the computing device, a first transcription of the first portion of the audio stream and presenting the first transcription on the display of the computing device in near real time as the first portion of the audio stream is received; and
generating, by the computing device, the first clip comprising the video data stored in association with transcription data of the audio stream, wherein the transcription data comprises the first transcription and a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by the microphone, andwherein presentation of the tokens in the first clip imitates a cadence of corresponding spoken words when they were captured during the recording of the video data.
1 Assignment
0 Petitions
Accused Products
Abstract
In some implementations, a user device can be configured to create media messages with automatic titling. For example, a user can create a media messaging project that includes multiple video clips. The video clips can be generated based on video data and/or audio data captured by the user device and/or based on pre-recorded video data and/or audio data obtained from various storage locations. When the user device captures the audio data for a clip, the user device can obtain a speech-to-text transcription of the audio data in near real time and present the transcription data (e.g., text) overlaid on the video data while the video data is being captured or presented by the user device.
-
Citations
20 Claims
-
1. A method comprising:
-
presenting, by a computing device, a graphical user interface for generating clips associated with a media message on a display of the computing device; receiving, by the computing device, a first user input selecting a first graphical element for initiating a recording of a first clip, the first graphical element presented on the graphical user interface; in response to the first user input, receiving, by the computing device, an audio stream from a microphone associated with the computing device, the audio stream including speech captured by the microphone; in response to the first user input, receiving, by the computing device, a video stream from a camera associated with the computing device, the video stream including video data captured by the camera; in response to receiving a first portion of the audio stream and while receiving a second portion the audio stream, obtaining, by the computing device, a first transcription of the first portion of the audio stream and presenting the first transcription on the display of the computing device in near real time as the first portion of the audio stream is received; and generating, by the computing device, the first clip comprising the video data stored in association with transcription data of the audio stream, wherein the transcription data comprises the first transcription and a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by the microphone, and wherein presentation of the tokens in the first clip imitates a cadence of corresponding spoken words when they were captured during the recording of the video data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
obtaining, by a computing device, a media message, the media message including a sequence of clips, each clip including video data and transcription data, wherein the transcription data for each clip includes a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by a microphone; receiving, by the computing device, a user input initiating playback of the media message; in response to the user input, selecting, by the computing device, a first clip in the sequence of clips; obtaining, by the computing device, a first video data associated with the first clip; obtaining, by the computing device, a first transcription data associated with the first clip; and while presenting the first video data on a display of the computing device, presenting, by the computing device, the tokens in the first transcription data according to the timing data for each respective token such that the presentation of the tokens in the first transcription data imitates the cadence of the corresponding spoken words when they were captured during the recording of the video data. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computing device comprising:
-
one or more processors; and a non-transitory computer-readable medium including one or more sequences of instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; presenting, by the computing device, a graphical user interface for generating clips associated with a media message on a display of the computing device; receiving, by the computing device, a first user input selecting a first graphical element for initiating a recording of a first clip, the first graphical element presented on the graphical user interface; in response to the first user input, receiving, by the computing device, an audio stream from a microphone associated with the computing device, the audio stream including speech captured by the microphone; in response to the first user input, receiving, by the computing device, a video stream from a camera associated with the computing device, the video stream including video data captured by the camera; in response to receiving a first portion of the audio stream and while receiving a second portion the audio stream, obtaining, by the computing device, a first transcription of the first portion of the audio stream and presenting the first transcription on the display of the computing device in near real time as the first portion of the audio stream is received; and generating, by the computing device, the first clip comprising the video data stored in association with transcription data of the audio stream, wherein the transcription data comprises the first transcription and a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by the microphone, and wherein presentation of the tokens in the first clip imitates a cadence of corresponding spoken words when they were captured during the recording of the video data. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computing device comprising:
-
one or more processors; and a non-transitory computer-readable medium including one or more sequences of instructions that, when executed by the one or more processors, cause the processors to perform operations comprising; obtaining, by the computing device, a media message, the media message including a sequence of clips, each clip including video data and transcription data, wherein the transcription data for each clip includes a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by a microphone; receiving, by the computing device, a user input initiating playback of the media message; in response to the user input, selecting, by the computing device, a first clip in the sequence of clips; obtaining, by the computing device, a first video data associated with the first clip; obtaining, by the computing device, a first transcription data associated with the first clip; and while presenting the first video data on a display of the computing device, presenting, by the computing device, the tokens in the first transcription data according to the timing data for each respective token such that the presentation of the tokens in the first transcription data imitates the cadence of the corresponding spoken words when they were captured during the recording of the video data. - View Dependent Claims (17, 18, 19, 20)
-
Specification