Method and apparatus for evaluating the visual quality of processed digital video sequences
First Claim
1. A digital video quality method for evaluating the visual quality of a processed (T) video sequence relative to an original (R) video sequence, the method comprising:
- sampling the original and processed video sequences to generate sampled sequences (d1) therefrom;
limiting the processing of said sampled sequences (d1) to a region of interest and generating region of interest sequences (d2) therefrom;
transforming said region of interest sequences (d2) to local contrast coefficients (d17);
filtering said local contrast coefficients (d17) to generate filtered components (d18) therefrom;
converting said filtered components (d18) to threshold units (d19);
subtracting said threshold units (d19) corresponding to the original (R) and processed (T) sequences to obtain an error sequence (d20);
subjecting said error sequence (d20) to a contrast masking operation to generate a masked error sequence (d24) therefrom; and
pooling said masked error sequence (d24) to generate a perceptual error (EΩ
).
1 Assignment
0 Petitions
Accused Products
Abstract
A Digital Video Quality (DVQ) apparatus and method that incorporate a model of human visual sensitivity to predict the visibility of artifacts. The DVQ method and apparatus are used for the evaluation of the visual quality of processed digital video sequences and for adaptively controlling the bit rate of the processed digital video sequences without compromising the visual quality. The DVQ apparatus minimizes the required amount of memory and computation. The input to the DVQ apparatus is a pair of color image sequences: an original (R) non-compressed sequence, and a processed (T) sequence. Both sequences (R) and (T) are sampled, cropped, and subjected to color transformations. The sequences are then subjected to blocking and discrete cosine transformation, and the results are transformed to local contrast. The next step is a time filtering operation which implements the human sensitivity to different time frequencies. The results are converted to threshold units by dividing each discrete cosine transform coefficient by its respective visual threshold. At the next stage the two sequences are subtracted to produce an error sequence. The error sequence is subjected to a contrast masking operation, which also depends upon the reference sequence (R). The masked errors can be pooled in various ways to illustrate the perceptual error over various dimensions, and the pooled error can be converted to a visual quality measure.
-
Citations
32 Claims
-
1. A digital video quality method for evaluating the visual quality of a processed (T) video sequence relative to an original (R) video sequence, the method comprising:
-
sampling the original and processed video sequences to generate sampled sequences (d1) therefrom;
limiting the processing of said sampled sequences (d1) to a region of interest and generating region of interest sequences (d2) therefrom;
transforming said region of interest sequences (d2) to local contrast coefficients (d17);
filtering said local contrast coefficients (d17) to generate filtered components (d18) therefrom;
converting said filtered components (d18) to threshold units (d19);
subtracting said threshold units (d19) corresponding to the original (R) and processed (T) sequences to obtain an error sequence (d20);
subjecting said error sequence (d20) to a contrast masking operation to generate a masked error sequence (d24) therefrom; and
pooling said masked error sequence (d24) to generate a perceptual error (EΩ
).- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
wherein said color channels are converted by a color transformer to a perceptually relevant color space, to generate color transformed sequences (d9) from said region of interest sequences (d2).
-
-
6. A method according to claim 5, further including subjecting said color transformed sequences (d9) to blocking to generates blocks.
-
7. A method according to claim 6, further including converting said blocks to a block of frequency coefficients (d10) by means of a discrete cosine transformer.
-
8. A method according to claim 7, wherein said block of frequency coefficients (d10) are converted to a local contrast signal (d17) by means of a local contrast converter;
- and
wherein said local contrast signal (d17) includes a combination of AC coefficients (d17a) and DC coefficients (d17b).
- and
-
9. A method according to claim 8, wherein contrast masking is accomplished by rectifying said threshold units (d19).
-
10. A method according to claim 5, wherein said region of interest sequences (d2) are transformed from their native color space to gamma-corrected color channels R′
- , G′
, and B′
by a R′
G′
B′
transformer.
- , G′
-
11. A method according to claim 10, further including converting said color channels R′
- , G′
, and B′
to RGB color channels by a RGB transformer.
- , G′
-
12. A method according to claim 11, further including converting said RGB color channels to XYZ color coordinates by a XYZ transformer.
-
13. A method according to claim 12, further including converting said XYZ color coordinates to YOZ color coordinates by a YOZ transformer.
-
14. A method according to claim 13, wherein if any of the processed (T) video sequence or the original (R) video sequence contains interlaced video fields, then de-interlacing said interlaced fields to a progressive sequence (d7) by means of a de-interlacer.
-
15. A method according to claim 14, wherein de-interlacing is implemented by inserting blank lines into even numbered lines in odd fields, and odd numbered lines in even fields.
-
16. A method according to claim 14, wherein de-interlacing is implemented by inserting blank lines into even numbered lines in even fields, and odd numbered lines in odd fields.
-
17. A method according to claim 14, wherein de-interlacing is implemented by each pair of odd and even video fields as an image.
-
18. A method according to claim 14, further including adding a veiling light to said progressive sequence (d7) by means of a veiling light combiner.
-
19. A method according to claim 1, wherein sampling includes pixel-replication.
-
20. A digital video quality apparatus with an original (R) video sequence and a processed (T) video sequence being fed thereto, the apparatus comprising:
-
a sampler for sampling the original and processed video sequences to generate sampled sequences (d1) therefrom;
a region-of-interest processor for limiting the processing of said sampled sequences (d1) to a region of interest and for generating region of interest sequences (d2) therefrom;
a local contrast converter for transforming said region of interest sequences (d2) to local contrast coefficients (d17);
a time filter for filtering said local contrast coefficients (d17) and for generating filtered components (d18) therefrom;
a threshold scaler for converting said filtered components (d18) to threshold units (d19);
a subtractor for subtracting said threshold units (d19) corresponding to the original (R) and processed (T) sequences to obtain an error sequence (d20);
a contrast masking processor for subjecting said error sequence (d20) to a contrast masking operation and for generating a masked error sequence (d24) therefrom; and
a pooling processor for pooling said masked error sequence (d24) to generate a perceptual error (EΩ
).- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
a color transformer that converts said color channels to a perceptually relevant color space, for generating color transformed sequences (d9) from said region of interest sequences (d2).
-
-
25. An apparatus according to claim 24, further including a block constructor that subjects said color transformed sequences (d9) to blocking, in order to generate blocks.
-
26. An apparatus according to claim 25, further including a discrete cosine transformer for converting said blocks to a block of frequency coefficients (d10).
-
27. An apparatus according to claim 26, wherein if any of the processed (T) video sequence or the original (R) video sequence contains interlaced video fields, then de-interlacing said interlaced fields to a progressive sequence (d7) by means of a de-interlacer.
-
28. An apparatus according to claim 27, further including a veiling light combiner for adding a veiling light to said progressive sequence (d7).
-
29. An apparatus according to claim 28, further including a local contrast converter for converting said block of frequency coefficients (d10) to a local contrast signal (d17);
- and
wherein said local contrast signal (d17) includes a combination of AC coefficients (d17a) and DC coefficients (d17b).
- and
-
30. A digital video quality apparatus with original (R) video sequence and a processed (T) video signal being fed thereto, the apparatus comprising:
-
a sampler for sampling the original and processed video sequences and for generating sampled sequences (d1) therefrom;
a region-of-interest processor for limiting the processing of said sampled sequences (d1) to a region of interest and for generating region of interest sequences (d2) therefrom;
a local contrast converter for transforming said region of interest sequences (d2) to local contrast coefficients (d17);
a time filter for filtering said local contrast coefficients (d17) and for generating filtered components (d18) therefrom;
a threshold scaler for converting said said filtered components (d18) to threshold units (d19);
a subtractor for subtracting said threshold units (d19) corresponding to the original (R) and processed (T) sequences to obtain an error sequence (d20);
a contrast masking processor for subjecting said subtracted error sequence (d20) to a contrast masking operation and generating a masked error sequence (d24); and
a pooling processor for pooling said error sequence (d20) to generate a perceptual error (EΩ
).- View Dependent Claims (31)
-
-
32. A digital quality method for evaluating the visual quality of a processed (T) video sequence relative to an original (R) video sequence, the method comprising:
-
sampling the original and processed video sequences and for generating sampled sequences (d1) therefrom;
limiting the processing of said sampled sequences (d1) to a region of interest and for generating region of interest sequences (d2) therefrom;
transferring said region of interest sequences (d2) to local contrast coefficients (d17);
filtering said local contrast coefficients (d17) and for generating components (d18) therefrom;
subtracting said threshold units (d19) to obtain an error sequence (d20);
subjecting said error sequence (d20) to a contrast masking operation to obtain a masked error sequence (d24); and
pooling said error sequence (d20) to generate a perceptual (EΩ
).
-
Specification