Media message creation with automatic titling

US 10,560,656 B2
Filed: 03/15/2018
Issued: 02/11/2020
Est. Priority Date: 03/19/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

presenting, by a computing device, a graphical user interface for generating clips associated with a media message on a display of the computing device;

receiving, by the computing device, a first user input selecting a first graphical element for initiating a recording of a first clip, the first graphical element presented on the graphical user interface;

in response to the first user input, receiving, by the computing device, an audio stream from a microphone associated with the computing device, the audio stream including speech captured by the microphone;

in response to the first user input, receiving, by the computing device, a video stream from a camera associated with the computing device, the video stream including video data captured by the camera;

in response to receiving a first portion of the audio stream and while receiving a second portion the audio stream, obtaining, by the computing device, a first transcription of the first portion of the audio stream and presenting the first transcription on the display of the computing device in near real time as the first portion of the audio stream is received; and

generating, by the computing device, the first clip comprising the video data stored in association with transcription data of the audio stream, wherein the transcription data comprises the first transcription and a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by the microphone, andwherein presentation of the tokens in the first clip imitates a cadence of corresponding spoken words when they were captured during the recording of the video data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, a user device can be configured to create media messages with automatic titling. For example, a user can create a media messaging project that includes multiple video clips. The video clips can be generated based on video data and/or audio data captured by the user device and/or based on pre-recorded video data and/or audio data obtained from various storage locations. When the user device captures the audio data for a clip, the user device can obtain a speech-to-text transcription of the audio data in near real time and present the transcription data (e.g., text) overlaid on the video data while the video data is being captured or presented by the user device.

Citations

20 Claims

1. A method comprising:
- presenting, by a computing device, a graphical user interface for generating clips associated with a media message on a display of the computing device;
  
  receiving, by the computing device, a first user input selecting a first graphical element for initiating a recording of a first clip, the first graphical element presented on the graphical user interface;
  
  in response to the first user input, receiving, by the computing device, an audio stream from a microphone associated with the computing device, the audio stream including speech captured by the microphone;
  
  in response to the first user input, receiving, by the computing device, a video stream from a camera associated with the computing device, the video stream including video data captured by the camera;
  
  in response to receiving a first portion of the audio stream and while receiving a second portion the audio stream, obtaining, by the computing device, a first transcription of the first portion of the audio stream and presenting the first transcription on the display of the computing device in near real time as the first portion of the audio stream is received; and
  
  generating, by the computing device, the first clip comprising the video data stored in association with transcription data of the audio stream, wherein the transcription data comprises the first transcription and a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by the microphone, andwherein presentation of the tokens in the first clip imitates a cadence of corresponding spoken words when they were captured during the recording of the video data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, further comprising:
    - presenting, by the computing device, the video stream on the display of the computing device as the video stream is received by the computing device; and
      
      presenting, by the computing device, in response to obtaining the first transcription, the first transcription over the video stream on the display of the computing device in near real time as the audio stream is received.
  - 3. The method of claim 2, further comprising:
    - receiving, by the computing device, a second user input selecting a titling style for the first clip, the titling style defining how to present the transcription data associated with the first clip; and
      
      presenting, by the computing device, the first transcription according to the selected titling style for the first clip.
  - 4. The method of claim 1, further comprising:
    - determining, by the computing device, whether voiceovers are enabled for the first clip;
      
      persistently storing, by the computing device, the audio stream in association with the first clip when voiceovers are enabled for the first clip; and
      
      deleting, by the computing device, the audio stream after transcribing the audio stream associated with the first clip when voiceovers are disabled for the first clip.
  - 5. The method of claim 1, further comprising:
    - storing, by the computing device, the media message comprising the first clip.

6. A method comprising:
- obtaining, by a computing device, a media message, the media message including a sequence of clips, each clip including video data and transcription data, wherein the transcription data for each clip includes a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by a microphone;
  
  receiving, by the computing device, a user input initiating playback of the media message;
  
  in response to the user input, selecting, by the computing device, a first clip in the sequence of clips;
  
  obtaining, by the computing device, a first video data associated with the first clip;
  
  obtaining, by the computing device, a first transcription data associated with the first clip; and
  
  while presenting the first video data on a display of the computing device, presenting, by the computing device, the tokens in the first transcription data according to the timing data for each respective token such that the presentation of the tokens in the first transcription data imitates the cadence of the corresponding spoken words when they were captured during the recording of the video data.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method of claim 6, wherein the timing data for each respective token includes a time offset and a duration, and further comprising:
    - presenting, by the computing device, a particular token in the first clip according to the time offset and the duration associated with the particular token.
  - 8. The method of claim 7, wherein the time offset corresponds to an amount of time from a beginning of the video data, and further comprising:
    - while presenting the video data, determining, by the computing device, that a first amount of time has elapsed since the beginning of the video data;
      
      comparing the elapsed time to the time offset for the particular token; and
      
      presenting, by the computing device, text corresponding to the particular token on a display of the computing device when the elapsed time corresponds to the time offset for the particular token.
  - 9. The method of claim 8, further comprising:
    - presenting the text corresponding to the particular token for a period of time corresponding to the duration associated with the token.
  - 10. The method of claim 6, further comprising:
    - determining a titling style associated with the first clip;
      
      determining text display attributes defined by the titling style, including a font, size, color, location, animation, or a combination thereof; and
      
      presenting text associated with the particular token according to the text display attributes defined by the titling style associated with the first clip.

11. A computing device comprising:
- one or more processors; and
  
  a non-transitory computer-readable medium including one or more sequences of instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  presenting, by the computing device, a graphical user interface for generating clips associated with a media message on a display of the computing device;
  
  receiving, by the computing device, a first user input selecting a first graphical element for initiating a recording of a first clip, the first graphical element presented on the graphical user interface;
  
  in response to the first user input, receiving, by the computing device, an audio stream from a microphone associated with the computing device, the audio stream including speech captured by the microphone;
  
  in response to the first user input, receiving, by the computing device, a video stream from a camera associated with the computing device, the video stream including video data captured by the camera;
  
  in response to receiving a first portion of the audio stream and while receiving a second portion the audio stream, obtaining, by the computing device, a first transcription of the first portion of the audio stream and presenting the first transcription on the display of the computing device in near real time as the first portion of the audio stream is received; and
  
  generating, by the computing device, the first clip comprising the video data stored in association with transcription data of the audio stream, wherein the transcription data comprises the first transcription and a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by the microphone, andwherein presentation of the tokens in the first clip imitates a cadence of corresponding spoken words when they were captured during the recording of the video data.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computing device of claim 11, wherein the instructions cause the one or more processors to perform operations comprising:
    - presenting, by the computing device, the video stream on the display of the computing device as the video stream is received by the computing device; and
      
      presenting, by the computing device, in response to obtaining the first transcription, the first transcription over the video stream on the display of the computing device in near real time as the audio stream is received.
  - 13. The computing device of claim 12, wherein the instructions cause the one or more processors to perform operations comprising:
    - receiving, by the computing device, a second user input selecting a titling style for the first clip, the titling style defining how to present the transcription data associated with the first clip; and
      
      presenting, by the computing device, the first transcription according to the selected titling style for the first clip.
  - 14. The computing device of claim 11, wherein the instructions cause the one or more processors to perform operations comprising:
    - determining, by the computing device, whether voiceovers are enabled for the first clip;
      
      persistently storing, by the computing device, the audio stream in association with the first clip when voiceovers are enabled for the first clip; and
      
      deleting, by the computing device, the audio stream after transcribing the audio stream associated with the first clip when voiceovers are disabled for the first clip.
  - 15. The computing device of claim 11, wherein the instructions cause the one or more processors to perform operations comprising:
    - storing, by the computing device, the media message comprising the first clip.

16. A computing device comprising:
- one or more processors; and
  
  a non-transitory computer-readable medium including one or more sequences of instructions that, when executed by the one or more processors, cause the processors to perform operations comprising;
  
  obtaining, by the computing device, a media message, the media message including a sequence of clips, each clip including video data and transcription data, wherein the transcription data for each clip includes a plurality of tokens, each token having respective timing data that indicates when, during a recording of the video data, a spoken word corresponding to the token was captured by a microphone;
  
  receiving, by the computing device, a user input initiating playback of the media message;
  
  in response to the user input, selecting, by the computing device, a first clip in the sequence of clips;
  
  obtaining, by the computing device, a first video data associated with the first clip;
  
  obtaining, by the computing device, a first transcription data associated with the first clip; and
  
  while presenting the first video data on a display of the computing device, presenting, by the computing device, the tokens in the first transcription data according to the timing data for each respective token such that the presentation of the tokens in the first transcription data imitates the cadence of the corresponding spoken words when they were captured during the recording of the video data.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computing device of claim 16, wherein the timing data for each respective token includes a time offset and a duration, and wherein the instructions cause the one or more processors to perform operations comprising:
    - presenting, by the computing device, a particular token in the first clip according to the time offset and the duration associated with the particular token.
  - 18. The computing device of claim 17, wherein the time offset corresponds to an amount of time from a beginning of the video data, and wherein the instructions cause the one or more processors to perform operations comprising:
    - while presenting the video data, determining, by the computing device, that a first amount of time has elapsed since the beginning of the video data;
      
      comparing the elapsed time to the time offset for the particular token; and
      
      presenting, by the computing device, text corresponding to the particular token on a display of the computing device when the elapsed time corresponds to the time offset for the particular token.
  - 19. The computing device of claim 18, wherein the instructions cause the one or more processors to perform operations comprising:
    - presenting the text corresponding to the particular token for a period of time corresponding to the duration associated with the token.
  - 20. The computing device of claim 16, wherein the instructions cause the one or more processors to perform operations comprising:
    - determining a titling style associated with the first clip;
      
      determining text display attributes defined by the titling style, including a font, size, color, location, animation, or a combination thereof; and
      
      presenting text associated with the particular token according to the text display attributes defined by the titling style associated with the first clip.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Weil, Joseph-Alexander P., Harding, Andrew L., Black, David, Brasure, James, Berkeley, Joash S., Ernst, Katherine K., Salvador, Richard, Sheeler, Stephen, Cummings, William D., Wang, Xiaohuan Corina, Clark, Robert L., O'Neil, Kevin M.
Primary Examiner(s)
Nguyen, Huy T

Application Number

US15/921,866
Publication Number

US 20180270446A1
Time in Patent Office

698 Days
Field of Search
US Class Current
CPC Class Codes

H04N 21/42203   sound input device, e.g. mi...

H04N 21/4223   Cameras H04N23/00 takes pre...

H04N 21/43072   of multiple content streams...

H04N 21/4334   Recording operations record...

H04N 21/4394   involving operations for an...

H04N 21/4398   involving reformatting oper...

H04N 21/440236   by media transcoding, e.g. ...

H04N 21/472   End-user interface for requ...

H04N 21/4858   for modifying screen layout...

H04N 5/76   Television signal recording

H04N 5/9202   the additional signal being...

H04N 5/9207   for teletext

H04N 9/8211   the additional signal being...

H04N 9/8233   the additional signal being...

Media message creation with automatic titling

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Media message creation with automatic titling

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links