How video compression works

Introduction

Video compression is a crucial technology that enables the efficient transmission and storage of video data. In this article, we will explore the main principles of video compression using the widely-used H.264/AVC codec as a prime example.

Understanding Digital Images and Video

Digital images are composed of pixels, with each pixel's color represented using the RGB (Red, Green, Blue) model. Each primary color consists of 8 bits, resulting in 24 bits of information needed to transmit a single pixel. For instance, a full HD resolution image, which is commonly used, without any compression, requires approximately 50 megabits of data to transmit.

Considering that a standard streamed video contains about 25 frames per second, the uncompressed data rate escalates to 1.2 gigabits per second. Such high data transmission rates are not feasible, especially when the average internet data transfer speed reported at the end of 2023 was only 85 megabits per second.

The YUV Color Space

With the advent of color television, the YUV color space model became prevalent for video transmission and storage. This model employs three components:

Y (luminance)
Cr (chrominance)
Cb (chrominance)

Each component also comprises 8 bits of information. The YUV model recognizes that humans are less sensitive to color details compared to brightness, and it allows for various chroma subsampling formats. The most common format, 4:2:0, transmits the luminance information fully while only half of the color samples are transmitted. Although this leads to some data loss, the difference may not be perceptible to the human eye.

Transmitting full HD video at 25 frames per second using this method requires about 600 megabits per second, still a significant amount of data, but much more manageable than the raw RGB method.

The Role of Encoders in Compression

To significantly reduce the size of video data while preserving high quality, encoders are utilized. For example, one second of full HD video compressed with an H.264/AVC encoder only requires 6 to 8 megabits per second, making it about 200 times less than without compression.

The core compression principle of any encoder lies in predicting video frames and transmitting the differences between the predicted and original frames. The efficiency of this compression process highly depends on the effectiveness of the prediction algorithms.

Frame Prediction Techniques

The H.264/AVC encoder operates by partitioning frames into microblocks, with a maximum size of 16x16 pixels. More complex areas can be divided further into smaller blocks of 4x4 pixels.

There are two types of prediction employed:

Intra Prediction (Spatial): The encoder looks at neighboring blocks within the same frame to create a prediction. Various modes, such as DC prediction and angular prediction, are utilized depending on the specifics of the blocks.
Inter Prediction (Temporal): The encoder searches for similar blocks in previous or subsequent frames. This type of prediction can be unidirectional or bidirectional, allowing for more advanced referencing.

Residual Information and Encoding

After predicting the frames, the encoder compares the predicted frame with the original frame and stores the differences, known as residual information. This information is transformed into frequency data through a discrete cosine transform, breaking it down into coefficients.

To reduce data further, quantization is employed, where coefficients are divided by an integer, creating numerous zeros and negligible values. At the final encoding stage, the quantized coefficients undergo entropy encoding to match frequent values with shorter code words.

Decoding Process

The decoding process reverses the operations performed by the encoder, reconstructing the uncompressed YUV frames. These frames are then converted back to RGB for display.

Conclusion

Modern encoders leverage more sophisticated techniques in frame partitioning, prediction, and redundancy elimination for better compression efficiency. The principles described in this article can be applied to other codecs as well and form a solid foundation for understanding video compression technology.

Keywords

Video Compression
H.264/AVC Codec
RGB Model
Full HD Resolution
YUV Color Space
Chroma Subsampling
Frame Prediction
Intra Prediction
Inter Prediction
Residual Information
Quantization
Entropy Encoding

FAQ

Q1: What is video compression?
A1: Video compression is a process that reduces the amount of data required to store or transmit video files while retaining acceptable quality.

Q2: How does the H.264 codec reduce data size?
A2: The H.264 codec compresses video by using techniques such as frame prediction, spatial and temporal compression, and quantization to minimize data while maintaining quality.

Q3: What is the difference between intra and inter prediction?
A3: Intra prediction refers to predicting a block using information from neighboring blocks within the same frame, while inter prediction uses information from other frames.

Q4: What is quantization in video encoding?
A4: Quantization is the process of reducing the precision of the coefficients derived from the discrete cosine transform, which helps in minimizing data size by discarding less significant values.

Q5: What are residuals in video compression?
A5: Residuals are the differences between the predicted frame and the original frame, which are stored and processed to reconstruct the final video output.