Systems and methods of rendering a textual animation

US 9,159,338 B2
Filed: 12/03/2010
Issued: 10/13/2015
Est. Priority Date: 05/04/2010
Status: Active Grant

First Claim

Patent Images

1. A method of rendering a textual animation, comprising:

receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source;

sending the audio sample to a server;

in response to sending the audio sample to the server, receiving one or more descriptors for the audio signal based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of vocal elements of the audio signal as a function of time with respect to a length of the audio signal, wherein the audio vector indicates an audio content of audio elements of the audio signal as a function of time with respect to a length of the audio signal, and wherein the emotion vector indicates an emotional content of audio elements of the audio signal as a function of time with respect to a length of the audio signal;

determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and

based on the one or more descriptors, a client device rendering the textual transcriptions of vocal elements of the audio signal in a dynamic animation, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the audio signal.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods of rendering a textual animation are provided. The methods include receiving an audio sample of an audio signal that is being rendered by a media rendering source. The methods also include receiving one or more descriptors for the audio signal based on at least one of a semantic vector, an audio vector, and an emotion vector. Based on the one or more descriptors, a client device may render the textual transcriptions of vocal elements of the audio signal in an animated manner. The client device may further render the textual transcriptions of the vocal elements of the audio signal to be substantially in synchrony to the audio signal being rendered by the media rendering source. In addition, the client device may further receive an identification of a song corresponding to the audio sample, and may render lyrics of the song in an animated manner.

Citations

32 Claims

1. A method of rendering a textual animation, comprising:
- receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source;
  
  sending the audio sample to a server;
  
  in response to sending the audio sample to the server, receiving one or more descriptors for the audio signal based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of vocal elements of the audio signal as a function of time with respect to a length of the audio signal, wherein the audio vector indicates an audio content of audio elements of the audio signal as a function of time with respect to a length of the audio signal, and wherein the emotion vector indicates an emotional content of audio elements of the audio signal as a function of time with respect to a length of the audio signal;
  
  determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and
  
  based on the one or more descriptors, a client device rendering the textual transcriptions of vocal elements of the audio signal in a dynamic animation, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 23)
- - 2. The method of claim 1, further comprising based on the one or more descriptors, the client device rendering the textual transcriptions of the vocal elements of the audio signal to be substantially in synchrony to the audio signal being rendered by the media rendering source.
  - 3. The method of claim 2, wherein the audio sample is associated with a timestamp corresponding to a beginning time of the audio sample, and the method further comprises:
    - receiving a time offset indicating a time position in the audio signal corresponding to the beginning time of the audio sample;
      
      determining a real-time offset using a real-time timestamp, the timestamp of the audio sample, and the time offset, wherein the real-time timestamp indicates a present time;
      
      receiving the textual transcriptions of the vocal elements; and
      
      the client device rendering the textual transcriptions of the vocal elements of the audio signal at a time corresponding to the real-time offset and to be substantially in synchrony to the audio sample being rendered by the media rendering source.
  - 4. The method of claim 2, wherein the audio sample is associated with a timestamp corresponding to a beginning time of the audio sample, and the method further comprises:
    - receiving a time offset indicating a time position in the audio signal corresponding to the beginning time of the audio sample;
      
      determining a real-time offset using a real-time timestamp, the timestamp of the audio sample, and the time offset, wherein the real-time timestamp indicates a present time;
      
      receiving a numeric skew indicating a speed at which the audio signal is being rendered by the media rendering source with reference to a given rendering speed of the audio signal;
      
      receiving the textual transcriptions of the vocal elements; and
      
      the client device rendering the textual transcriptions of the vocal elements of the audio signal at a time corresponding to the real-time offset, at the speed indicated by the numeric skew, and to be substantially in synchrony to the audio sample being rendered by the media rendering source.
  - 5. The method of claim 1, further comprising:
    - receiving an identification of a song corresponding to the audio sample; and
      
      receiving the textual transcriptions of the vocal elements of the song.
  - 6. The method of claim 1, further comprising receiving the one or more descriptors for the audio signal based on the identification of the song.
  - 7. The method of claim 1, wherein receiving the one or more descriptors for the audio signal comprises the client device receiving the one or more descriptors from a server.
  - 8. The method of claim 1, further comprising:
    - in response to sending the audio sample to the server;
      
      receiving an identification of a song corresponding to the audio sample; and
      
      receiving the one or more descriptors for the audio signal.
  - 9. The method of claim 1, further comprising:
    - receiving graphics or images; and
      
      based on the one or more descriptors, the client device rendering the graphics or images in an animated manner.
  - 10. The method of claim 1, further comprising the client device performing the method in real time.
  - 11. The method of claim 1, further comprising:
    - the client device receiving the one or more descriptors for the audio signal;
      
      storing in memory the one or more descriptors for the audio signal;
      
      subsequently receiving the audio sample, the audio sample being rendered by the media rendering source;
      
      the client device retrieving the one or more descriptors for the audio signal from the memory; and
      
      based on the one or more descriptors, the client device rendering the textual transcriptions of the vocal elements of the audio signal in the animated manner.
  - 12. The method of claim 1, further comprising:
    - based on at least one of the semantic vector, the audio vector, or the emotion vector, selecting the animation style selected from the group consisting of typefaces of font, speed of animation, saturation, and brightness; and
      
      the client device rendering the textual transcriptions of the vocal elements of the audio signal in the animated manner based on the selected animation style.
  - 13. The method of claim 1, wherein the audio signal is a song and the textual transcriptions of the vocal elements are lyrics of the song, and the method further comprises:
    - receiving background graphics; and
      
      displaying the lyrics and the background graphics in the dynamic animation based on the one or more descriptors, the animation of the lyrics being independent of the animation of the background graphics.
  - 14. The method of claim 1, wherein the client device performs the steps of:
    - receiving the audio sample of the audio signal and rendering the textual transcriptions of the vocal elements of the audio signal in the dynamic animation.
  - 15. The method of claim 1, further comprising:
    - determining textual transcriptions of the vocal elements of the audio sample using speech recognition; and
      
      based on the one or more descriptors, the client device rendering the textual transcriptions of the vocal elements of the audio sample in the dynamic animation.
  - 16. The method of claim 1, wherein the one or more descriptors includes a time stamp that indicates a time position in the audio signal at which to apply specified animation.
  - 23. The method of claim 1, further comprising determining musical content of the audio elements to form the audio vector.

17. A method comprising:
- receiving an audio sample;
  
  determining an identification of a song corresponding to the audio sample, the song comprising at least one of audio elements and vocal elements;
  
  retrieving one or more descriptors for the song based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of the vocal elements as a function of time with respect to a length of the song, wherein the audio vector indicates an audio content of the audio elements as a function of time with respect to a length of the song, and wherein the emotion vector indicates an emotional content of the audio elements as a function of time with respect to a length of the song;
  
  providing an animation style to be applied to the textual transcriptions per the length of the song based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the song, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and
  
  sending to a client device the one or more descriptors indicating a dynamic animation to apply to the textual transcriptions per the length of the song, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the song.
- View Dependent Claims (18, 19, 20, 21, 22, 24, 25, 26, 27, 28)
- - 18. The method of claim 17, further comprising:
    - based on the one or more descriptors, selecting a textual animation or a background animation for the song; and
      
      sending to the client device the textual animation or the background animation.
  - 19. The method of claim 18, further comprising associating a time stamp with the textual animation or the background animation for the song, wherein the time stamp indicates a time position in the song at which to apply the textual animation or the background animation.
  - 20. The method of claim 19, further comprising:
    - selecting multiple textual animations or multiple background animations for the song; and
      
      associating a given time stamp with each textual animation or background animation for the song.
  - 21. The method of claim 17, further comprising associating a time stamp with each descriptor in the one or more descriptors such that animation of the textual transcriptions of the vocal elements is substantially in synchrony with the audio sample when rendered.
  - 22. The method of claim 17, further comprising determining the semantic content of corresponding textual transcriptions of the vocal elements to form the semantic vector.
  - 24. The method of claim 17, further comprising determining the emotional content based on a musical key or tone of the audio elements to form the emotion vector.
  - 25. The method of claim 17, further comprising determining the emotional content based on lyrics of the audio sample.
  - 26. The method of claim 17, further comprising wirelessly receiving a request for the one or more descriptors from a mobile client device.
  - 27. The method of claim 17, wherein the audio sample is associated with a timestamp corresponding to a beginning time of the song, and the method further comprises:
    - determining a time offset indicating a time position in the song corresponding to the beginning time of the audio sample;
      
      sending to the client device the textual transcriptions of the vocal elements and the time offset to enable the client device to render the textual transcriptions of the vocal elements at a time corresponding to the time offset and to be substantially in synchrony to the song as being rendered by a media rendering source.
  - 28. The method of claim 17, wherein the audio sample is associated with a timestamp corresponding to a beginning time of the song, and the method further comprises:
    - determining a time offset indicating a time position in the song corresponding to the beginning time of the audio sample;
      
      determining a numeric skew indicating a speed at which the song is being rendered by a media rendering source with reference to a given rendering speed of the song; and
      
      sending to the client device the textual transcriptions of the vocal elements and the time offset to enable the client device to render the textual transcriptions of the vocal elements at a time corresponding to the time offset, at the speed indicated by the numeric skew, and to be substantially in synchrony to the song as being rendered by the media rendering source.

29. A non-transitory computer readable storage medium having stored therein instructions executable by a computing device to cause the computing device to perform functions of:
- receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source;
  
  sending the audio sample to a server;
  
  in response to sending the audio sample to the server, receiving one or more descriptors for the audio signal based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of corresponding textual transcriptions of vocal elements of the audio signal as a function of time with respect to a length of the audio signal, wherein the audio vector indicates an audio content of the audio elements of the audio signal as a function of time with respect to a length of the audio signal, and wherein the emotion vector indicates an emotional content of the audio elements of the audio signal as a function of time with respect to a length of the audio signal;
  
  determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and
  
  based on the one or more descriptors, rendering the textual transcriptions of vocal elements of the audio signal in a dynamic animation, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the textual transcriptions per the length of the audio signal.

30. A method of rendering a textual animation, comprising:
- receiving an audio sample of an audio signal comprising at least one of audio elements and vocal elements, the audio signal being rendered by a media rendering source;
  
  determining an identification of a song corresponding to the audio sample and lyrics corresponding to the vocal elements;
  
  receiving a set of descriptors for the song based on a semantic vector, an audio vector, and an emotion vector, wherein the semantic vector indicates a semantic content of the lyrics as a function of time with respect to a length of the song, wherein the audio vector indicates an audio content of audio elements of the song as a function of time with respect to a length of the song, and wherein the emotion vector indicates an emotional content of audio elements of the song as a function of time with respect to a length of the song;
  
  determining an animation style to be applied to the textual transcriptions per the length of the audio signal based on an ordering of values of the semantic vector, the audio vector, and the emotion vector per the length of the audio signal, wherein a respective combination of the values of the semantic vector, the audio vector, and the emotion vector corresponds to a respective animation style; and
  
  receiving a time offset indicating a time position in the audio signal corresponding to a beginning time of the audio sample;
  
  determining a real-time offset using a real-time timestamp, a beginning time of the audio sample, and the time offset, wherein the real-time timestamp indicates a present time;
  
  based on the set of descriptors, a client device rendering the lyrics in a dynamic animation at a time corresponding to the real-time offset and substantially in synchrony to the audio signal being rendered by the media rendering source, wherein the dynamic animation changes over time corresponding to each of the semantic vector, the audio vector, and the emotion vector that indicate the animation style to be applied to the lyrics per the length of the song.
- View Dependent Claims (31, 32)
- - 31. The method of claim 30, wherein determining the identification of the song corresponding to the audio sample comprises:
    - sending the audio sample to an audio identification system; and
      
      receiving information indicating the identification of the song.
  - 32. The method of claim 30, further comprising:
    - receiving a numeric skew indicating a speed at which the audio signal is being rendered by the media rendering source with reference to a given rendering speed of the audio signal; and
      
      the client device rendering the lyrics at a time corresponding to the real-time offset, at the speed indicated by the numeric skew, and substantially in synchrony to the audio sample being rendered by the media rendering source.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Shazam Entertainment Limited (Apple Inc.)
Inventors
Powar, Rahul, Wang, Avery Li-Chun
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
KOVACEK, DAVID M

Application Number

US12/960,165
Publication Number

US 20110273455A1
Time in Patent Office

1,775 Days
Field of Search

704/231, 704235-240, 704/270, 704/272, 704/276, 704E15001-E1505, 704E11001-E11007
US Class Current

1/1
CPC Class Codes

G06F 16/638   Presentation of query results

G10H 1/368   displaying animated or movi...

G10H 2220/011   Lyrics displays, e.g. for k...

G10H 2240/251   Mobile telephone transmissi...

G10L 15/26   Speech to text systems G10L...

G10L 21/06   Transformation of speech in...

G10L 21/10   Transforming into visible i...

G10L 21/18   Details of the transformati...

G11B 27/10   Indexing; Addressing; Timin...

G11B 27/11   by using information not de...

G11B 27/28   by using information signal...

Systems and methods of rendering a textual animation

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods of rendering a textual animation

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links