Multimedia system for mobile client platforms
First Claim
1. A method of transmitting multimedia to wireless clients, wherein the multimedia transmission method depends on:
- the creation of multimedia objects from existing multimedia files or dynamically from live multimedia streams;
a direct request and transmission of just the multimedia objects created from existing multimedia files or dynamically created multimedia objects from live multimedia streams by wireless client-based, multimedia object players;
and, a continuous playback of the received multimedia objects by wireless client-based multimedia players that are specifically designed to play continuous sequences of the multimedia objects.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for multimedia playback and transmission to wireless clients is described. A host webserver transcodes a live digital or analog audio-visual or audio broadcast signal and splits the input stream into small multimedia objects of an efficient compression such as MPEG4/AAC, and then immediately deploys the objects to distributed content servers for a geographically dispersed population of wireless clients. A java applet object player, downloaded to wireless clients at the beginning of the multimedia on-demand session, interprets and decodes the multimedia objects as they are received, using multiple levels of optimization. The applet uses novel video and audio decoding optimizations which can be generically applied to many digital video and audio codecs, and specifically decodes Simple Profile MPEG4 video and Low Complexity AAC audio.
127 Citations
20 Claims
-
1. A method of transmitting multimedia to wireless clients, wherein the multimedia transmission method depends on:
-
the creation of multimedia objects from existing multimedia files or dynamically from live multimedia streams;
a direct request and transmission of just the multimedia objects created from existing multimedia files or dynamically created multimedia objects from live multimedia streams by wireless client-based, multimedia object players;
and, a continuous playback of the received multimedia objects by wireless client-based multimedia players that are specifically designed to play continuous sequences of the multimedia objects. - View Dependent Claims (4)
-
-
2. The method 1, running on a distributed network system for multimedia-on-demand, utilizing a centralized content server, an indexing host, multimedia object creator and transcoder for live broadcast applications or to transcode and create multimedia objects from archived multimedia files, and distributed content servers involving high capacity cellular network proxy servers and mobile clients running downloaded java applets or embedded or downloaded non-java multimedia object players.
-
3. The method 1, wherein the transmission of said multimedia objects is by protocols such as:
-
HTTP, FTP, IMAP4 and NNTP, which have the capability to serve files in a directory structure;
and where HTTP 1.1 is used and allows pipelined connections over persistent TCP connections, multimedia object players can request many multimedia objects in rapid succession.
-
-
5. A method of creating multimedia objects, where, in the case of a live multimedia stream, the input multimedia stream is first transcoded into a optimal audiovisual format such as MPEG4/AAC and at an optimal encoding rate reflecting available cellular network bandwidth, then dynamically converted into multimedia objects by splitting the encoded stream into specified intervals, and then immediately transmitted to wireless clients to distributed content servers transmitting the recently created multimedia objects to wireless clients;
- alternatively, in the case of converting an archived multimedia file, the input multimedia stream is first transcoded into a optimal audiovisual format such as MPEG4/AAC and at an optimal encoding rate reflecting available cellular network bandwidth, and then converted into multimedia objects by splitting the encoded stream into specified intervals.
-
6. The method of 5, wherein the dynamically created multimedia objects are maintained by content servers serving the multimedia objects to wireless clients, as a windows of multimedia objects, during transmission to wireless clients.
-
7. The method 5, wherein the input multimedia stream is scanned after specified intervals for the next I-frame, and the multimedia segment is split at that next I-frame to create another multimedia object.
-
8. The method 5, wherein the input multimedia stream can be in analog audiovisual format or a variety of digital audiovisual formats, including MPEG4, MPEG1, MPEG2, MOV, AVI, WMV, ASF, and higher encoded MPEG4, or just audio formats, including analog audio, mp3, AMR, Windows Media Audio, RealAudio and higher encoded AAC.
-
9. The method 6, wherein a window of multimedia objects for live transmission is created and comprised of a small series of multimedia objects, which can be incremented and decremented as newly created objects are introduced to the window or transmitted to wireless clients.
-
10. The method 5, wherein multimedia objects are identified when they are created by the multimedia object creator with an Internet address that includes such information as:
-
the transport protocol;
the varying host URL, if there are many content servers involved as in a live broadcast application, of the transmission server or content server directly serving the wireless client;
the name of the multimedia object sequence or broadcast;
the number of multimedia objects in the sequence;
and, the multimedia object'"'"'s sequence number.
-
-
11. The method 5, whereby multimedia objects are spit from multiple MPEG4 composite layer streams by scanning time intervals and splitting them at next I-frames.
-
12. The method 5, whereby audio media objects are split from a single audio stream by splitting at set time intervals.
-
13. A method of wireless client side processing of multimedia objects by multimedia object player, wherein:
-
the Identification of the multimedia object is parsed and, the total number of multimedia objects within the Identification path is determined, or the number of multimedia objects in window is determined for live applications;
heap memory allocations for said multimedia objects and meta-data are determined;
to create a buffer on the wireless client for more than one multimedia object;
to identify multimedia object playing, multimedia object receiving and multimedia wait for states for the multimedia object sequence;
to hence use these states as a mechanism to synchronize the reception and playback of multimedia objects;
to pass this information onto the audio and/or video decoding components of the multimedia player to properly configure them to uniquely process the sequence of multimedia objects.
-
-
14. The method 13, whereby, following configuration of audio and/or video decoding components for a specific sequence of multimedia objects, the multimedia object player can delay playback until the multimedia object buffers in the wireless client memory have filled or can begin playback immediately while requesting the next multimedia object;
and, whereby, the multimedia object player decision can be based on the speed at which the multimedia objects are retrieved versus the playback time of each multimedia object, the latency of requests for multimedia objects, or the number of multimedia objects that can be stored in wireless client memory at once.
-
15. The method 13, following the parsing of the first multimedia object, its audio and video contents of the first and each subsequent multimedia object in the sequence are decoded and played back, whereby sufficient audio frames are decoded that their total display time is as long as the associated video frame and processing time of the next audio frame;
- and, whereby interleaving the processing between several audio frames and a single video frame, the multimedia object player can perform audio and video decoding in a single thread.
-
16. The method 13, whereby state information, also provides a mechanism that can be used to skip backwards and forwards through a multimedia object sequence, wherein changing the state information and restarting retrieval of multimedia objects, repositions playback from any multimedia object in the sequence;
- and, wherein the transmission is a live transmission, state information can reposition playback from any multimedia object within a current window.
-
17. A method for processing the large scale distribution of multimedia content in the distributed network being managed by an indexing host server, wherein:
- the indexing host registers all URLs of content servers supporting particular live multimedia object transmissions and archived sequences of multimedia objects;
remote transcoding/multimedia object creating servers provide registered updates of multimedia object sequence indices to the indexing host;
remote transcoding/multimedia object creating servers also register the sequence indices of the most recent windows of live content multimedia objects with the indexing host;
wherein content servers accept and store the most current window of live content multimedia objects or the most recent non-live archives of multimedia object sequences;
content servers transmit their multimedia directly to wireless clients, or indirectly through cellular network proxy servers;
and whereby, the indexing host verifies the wireless client;
the indexing host accepts requests from wireless clients for multimedia content;
the indexing host determines the most suitable content server for the wireless client;
and, the indexing host provides the wireless client with a decryption string for the requested multimedia content.
- the indexing host registers all URLs of content servers supporting particular live multimedia object transmissions and archived sequences of multimedia objects;
-
18. A method of optimized video decoding in decoding Variable Length Codes (VLCs) in Huffman codebooks which are used to compress Motion Vectors for motion compensation occurring in many macroblocks within P-frames, whereby, bits are read off the main video stream into an integer buffer (N);
-
the number of bits read is equivalent to the longest code in the VLC codebook;
the roof of logarithm (base 2) of N is taken;
based on the result, N is shifted and used as an index into an array containing the true value indicated in the codebook and the true length of the code;
the number of bits indicated as the true length is then removed from the video stream and processing continues;
said optimized video decoding using a texture buffer large enough to 4 luminance and 2 chrominance blocks (the dimensions of a macroblock exemplified in MPEG4 specification) to store predicted pixels from a reference frame;
said texture buffer decreases the amount of reading from and writing to non-consecutive bytes within the reference and output video frames;
all pixel residues are applied to the texture buffer which is then copied to the output frame;
to use a faster but less accurate IDCT algorithm with the process if the wireless handset cannot decode the video stream in real-time, to process these residue values;
furthermore, to minimize the effect of the less accurate IDCT algorithm but using this process first on the chrominance pixel residues;
said optimized video decoding processing faster motion compensation without bilinear interpolation when less quality but faster processing is required;
said optimized digital video decoding performing optimizations in pixel processing and dequantization, whereby original luminance and chrominance values are taken and 128 added and the result divided by 2;
values in the [−
128,383] range are then represented in the [0,255] range, decreasing luminance and chrominance accuracy without significantly affected RGB color resolution in the 4-bit to 18-bit range;
said optimized video decoding processing by optimizing Chen'"'"'s algorithm, whereby, different simplified versions of Chen'"'"'s algorithm are used, based on the energy input or distribution of input DC and AC coefficients, whereby, the energy or distribution of DC and AC coefficients is first assessed;
a simplified Chen'"'"'s algorithm is selected for IDCT processing;
a higher quality preference is given to luminance blocks;
and, the process is further optimized by recording rows of the input matrix to the IDCT that are populated with values. said optimized video decoding processing in the handling YUV to RGB conversion, whereby, YUV and RGB scaling functions are separated;
when scaling up, pixels are read on the source plane and copied to the output plane;
when scaling down, iteration is performed through pixel positions in the output plane and source pixels are calculated in the input plane; and
, sampling is performed on only a subset of chrominance pixels, avoiding pixel clipping or calculating the Red and Blue values for only a subset of output pixels;
said optimized video decoding processing by using short-cuts to permit video decoding to scale in complexity, based on the processing power of the wireless client, whereby, three quality levels are used with high being consistent with a correct image in the digital codec specification;
medium corresponds to some reduction in image quality to reduce processing time;
and low being a drastic reduction in image quality to improve processing time;
wherein a final option is to avoid the processing and display of P-frames when I-frames occur at regular intervals. said optimized video decoding processing by using short-cuts to permit video decoding to scale in complexity, based on the processing power of the wireless client, where state information defines the quality at which decoding should be performed at several steps of the decoding process;
said state information consisting of six integer value steps defining state;
Quality of the YUV to RGB conversion process;
Quality of the Inverse DCT for luminance blocks;
Quality of the Inverse DCT function for chrominance blocks;
Quality of Motion Compensation for luminance blocks;
Quality of Motion Compensation for chrominance blocks;
and, allowance to drop frames (from a single P-frame occurring before an I-Frame up to dropping all P-Frames);
said state information further including a single integer representing the quality level of the overall encoding, wherein, at each value of overall quality, a ruleset defines quality for each of the step qualities;
and, at the highest overall quality, all step qualities are set to maximum;
and, as overall quality is decreased, step qualities are incrementally reduced according to the ruleset.
-
-
19. A method of optimized audio decoding pertaining to a simplification of variable length codes (VLCs) in Huffman codebooks,
wherein, bits are read off the audio stream into an integer N; -
the number of bits read is equivalent to maximum number of bits in the longest codeword in the codebook;
the first binary 0 is then located starting from the highest bit;
the left-based index of this first 0 is then used to remove all the previous as; and
, N is shifted and used as an array index;
said optimized audio decoding for optimizations in the IMDCT step, whereby, the Inverse Fast Fourier Transform is combined with pre- and post-processing steps to produce a simplified IMDCT algorithm with O(n*nlog(n)) runtime, which can incorporate various IFFT algorithms based on the sparseness of input, and, which specifically involves the following combination of steps in a final optimization;
a) Re-order, pre-scale and twiddle, whereby, the method loops over the input data, and each datum is complex-multiplied by the twiddle factor, and is then re-scaled by doing a bit shift operation; and
, the twiddle factor is already bit-shifted so it can be treated as a fixed-point number, so the scaling operation'"'"'s bit shift is partially performed by the twiddle factor itself; and
the relevant twiddle factors are stored in an array table; and
finally, once the complex multiplication and scaling are done, the resulting values are stored in the re-ordered location in the IFFT input array;
b) Perform the fixed-point integer inverse Fourier transform;
c) Re-scale, re-order, post-twiddle, window and overlap, whereby combining these four operations into one step replaces four array accesses with one, and some multiplications are also combined into single bit shifts; and
hence, the method loops over the IFFT output array, and performs four operations in each iteration of the loop;
the post-twiddle and rescale are combined;
the post-twiddle uses a twiddle factor table which is already bit-shifted;
and, windowing is combined in this step also, with window values coming from either a table or a fast integer sine calculator;
and finally, values are overlapped and stored in the correct location in the output array;
said optimized audio decoding performing simplified input processing when specific to AAC Low Complexity (LC) audio decoding profile, wherein, Mid/Side, Intensity and Temporal Noise Shaping steps, are optional;
in cases where these three features are not present, there are no dependencies within a frame until the IFFT step within IMDCT itself; and
, operations between noiseless decoding and the pre-IFFT operations within IMDCT itself are combined, minimizing memory access.said optimized audio decoding using an alternative bit-operation based upon Taylor computation, wherein, trigonometric identities are used to express the sine calculation in terms of a sine in the range of 0 to PI/2, resulting in angle X;
X is multiplied by X, resulting in S;
perform a bit-shift operation by calculating X*(256−
S*(43−
(S<
<
1));
the result producing a window value in the range of 0 to 255, allowing fast windowing without the use of lookup tables;
and, combining the bit-shift operation with other fixed-point multiplication steps;
said optimized audio decoding using IMDCT short window processing for digital audio decoding, wherein, IMDCT 1024 values are divided into sequences of 8 short windows;
IMDCT window and overlap functions are performed on each short window;
each window of 128 values results in a synthesis output window of 256 values;
these output windows are then overlapped, resulting in a non-zero values in the range of 448 to 1600.
-
-
20. A method of low energy gap timing in audio playback, wherein, an interleaved process in audio decoding detects frames of low energy;
audio playback is controlled so a gap will occur during the detected frames, which may be dropped so that synchronization with video is not lost.
Specification