Systems and methods of rendering a textual animation
First Claim
1. A method of rendering a textual animation, comprising:
- receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source;
sending the audio sample to a server;
in response to sending the audio sample to the server, receiving one or more descriptors for the audio signal based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of vocal elements of the audio signal as a function of time with respect to a length of the audio signal, wherein the audio vector indicates an audio content of audio elements of the audio signal as a function of time with respect to a length of the audio signal, and wherein the emotion vector indicates an emotional content of audio elements of the audio signal as a function of time with respect to a length of the audio signal;
determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and
based on the one or more descriptors, a client device rendering the textual transcriptions of vocal elements of the audio signal in a dynamic animation, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the audio signal.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods of rendering a textual animation are provided. The methods include receiving an audio sample of an audio signal that is being rendered by a media rendering source. The methods also include receiving one or more descriptors for the audio signal based on at least one of a semantic vector, an audio vector, and an emotion vector. Based on the one or more descriptors, a client device may render the textual transcriptions of vocal elements of the audio signal in an animated manner. The client device may further render the textual transcriptions of the vocal elements of the audio signal to be substantially in synchrony to the audio signal being rendered by the media rendering source. In addition, the client device may further receive an identification of a song corresponding to the audio sample, and may render lyrics of the song in an animated manner.
-
Citations
32 Claims
-
1. A method of rendering a textual animation, comprising:
-
receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source; sending the audio sample to a server; in response to sending the audio sample to the server, receiving one or more descriptors for the audio signal based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of vocal elements of the audio signal as a function of time with respect to a length of the audio signal, wherein the audio vector indicates an audio content of audio elements of the audio signal as a function of time with respect to a length of the audio signal, and wherein the emotion vector indicates an emotional content of audio elements of the audio signal as a function of time with respect to a length of the audio signal; determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and based on the one or more descriptors, a client device rendering the textual transcriptions of vocal elements of the audio signal in a dynamic animation, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 23)
-
-
17. A method comprising:
-
receiving an audio sample; determining an identification of a song corresponding to the audio sample, the song comprising at least one of audio elements and vocal elements; retrieving one or more descriptors for the song based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of the vocal elements as a function of time with respect to a length of the song, wherein the audio vector indicates an audio content of the audio elements as a function of time with respect to a length of the song, and wherein the emotion vector indicates an emotional content of the audio elements as a function of time with respect to a length of the song; providing an animation style to be applied to the textual transcriptions per the length of the song based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the song, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and sending to a client device the one or more descriptors indicating a dynamic animation to apply to the textual transcriptions per the length of the song, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the song. - View Dependent Claims (18, 19, 20, 21, 22, 24, 25, 26, 27, 28)
-
-
29. A non-transitory computer readable storage medium having stored therein instructions executable by a computing device to cause the computing device to perform functions of:
-
receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source; sending the audio sample to a server; in response to sending the audio sample to the server, receiving one or more descriptors for the audio signal based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of vocal elements of the audio signal as a function of time with respect to a length of the audio signal, wherein the audio vector indicates an audio content of the audio elements of the audio signal as a function of time with respect to a length of the audio signal, and wherein the emotion vector indicates an emotional content of the audio elements of the audio signal as a function of time with respect to a length of the audio signal; determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and based on the one or more descriptors, rendering the textual transcriptions of vocal elements of the audio signal in a dynamic animation, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the audio signal.
-
-
30. A method of rendering a textual animation, comprising:
-
receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source; determining an identification of a song corresponding to the audio sample and lyrics corresponding to the vocal elements; receiving a set of descriptors for the song based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of the lyrics as a function of time with respect to a length of the song, wherein the audio vector indicates an audio content of audio elements of the song as a function of time with respect to a length of the song, and wherein the emotion vector indicates an emotional content of audio elements of the song as a function of time with respect to a length of the song; determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and receiving a time offset indicating a time position in the audio signal corresponding to a beginning time of the audio sample; determining a real-time offset using a real-time timestamp, a beginning time of the audio sample, and the time offset, wherein the real-time timestamp indicates a present time; based on the set of descriptors, a client device rendering the lyrics in a dynamic animation at a time corresponding to the real-time offset and substantially in synchrony to the audio signal being rendered by the media rendering source, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the lyrics per the length of the song. - View Dependent Claims (31, 32)
-
Specification