How Video Compression Works

Have you ever thought about how video streaming is possible? Let's think about how big a typical 1080p video is: 1920 by 1080 pixels, 24 bits each, 30 frames per second. That's almost one-and-a-half gigabits per second. How can you transmit that much data over the air in real time?

The answer is video compression. You might have heard of codecs; a codec is a piece of software that encodes and decodes data. The encoding part compresses the data, making it easier to store and transmit, and the decoding part reverses this process, recreating the original data as closely as possible. Use of codecs is not limited to video; they can be used to encode and decode many types of signals. But for now, let's focus on how video codecs work.

In a previous video, we talked about how still images are compressed. In short, images are compressed by throwing out the information that is less visible to the human eye and storing redundant data more efficiently. We can easily extend image compression to video compression by compressing a video frame by frame. This approach is called spatial or intra-frame coding. Even doing that alone would significantly reduce the file size, but we can actually do much more than that.

In a typical video, many consecutive frames tend to be nearly identical. We can make use of this temporal interframe redundancy to further compress a video. First, let's think of an extreme case where nothing moves in a video. Instead of storing each one of these identical frames, we can simply tell our encoder to keep the first frame and repeat it n times. That will save us a lot of space.

Now let's think of a more realistic case where only some parts in the video don't change. This time, we can do the same thing but more locally by dividing the frames into blocks and repeating only the blocks that don't change. What if all blocks change between consecutive frames but some change a lot and some change a little? Instead of checking whether a block has changed or not, we can search for a given block in the next frame within a neighborhood. This process is called block motion estimation.

How does this help with compression? Well, instead of saving every frame, we can save a reference frame and the motion vectors for the blocks. The motion vectors tell us how we should move the blocks to closely match the next frames. This is called motion compensation.

Although motion compensation can reduce the difference between two consecutive frames greatly, it is usually not enough by itself to fully create the next frame. So in addition to the motion vectors, we should also save the frame differences between the actual and motion-compensated frames. These differences are known as residual frames. When it's time to play this video, the decoder predicts the current frame by taking the previous reference frame, compensating for the motion using the motion vectors, and adding the residual frame.

You might ask, couldn't we just save the original frames instead of saving the residual frames? We could, but residual frames have much less information than the full reference frames, therefore they are highly compressible.

Let's review the entire process. Traditional video compression algorithms represent a video as a sequence of reference frames followed by residual frames. There are two types of compression done here: inter-frame and intra-frame. In this article, I focus mostly on the inter-frame coding part, which achieves high compression efficiency by exploiting the similarities between consecutive frames. Intra-frame coding, on the other hand, compresses the frame by throwing out visually redundant information within the frame and storing the rest more efficiently.

The methods I covered here are the very basics that are used by many codecs, including the mainstream H.264 codec, which is also known as MPEG-4 AVC. Modern video codecs, including H.264, H.265, and VP9, use sophisticated methods to balance the level of compression and perceptual image quality without introducing too much computational complexity.

Although video compression algorithms we use today are pretty mature, video compression is still an active area of research. Researchers have already been experimenting with machine learning models that have the potential to perform better than today’s block-based hybrid encoding standards. It's not easy to beat today’s encoding standards since they had decades to mature and be tuned in many possible ways. I still think that an end-to-end trainable codec will eventually offer advantages over traditional compression methods by optimizing perceptual image quality while minimizing the file size.

That's all for today. I hope you liked it. If you have any comments or questions, let me know in the comment section below. Subscribe for more articles, and as always, thanks for reading. Stay tuned and see you next time.

Keywords

Video Compression
Codecs
Encoding
Decoding
Spatial Coding
Temporal Redundancy
Intra-Frame Coding
Inter-Frame Coding
Motion Compensation
H.264
H.265
VP9
Machine Learning
Residual Frames
Perceptual Image Quality

FAQ

Q: What is video compression?
A: Video compression is the process of encoding and compressing video data to reduce its size, making it easier to store and transmit, and then decoding and decompressing it to recreate the original data as closely as possible.

Q: What is a codec?
A: A codec is a piece of software that encodes and decodes data. It compresses the data during encoding to reduce the file size and decompresses it during decoding to reconstruct the original data.

Q: What is intra-frame coding?
A: Intra-frame coding is the process of compressing a video frame by frame, similar to compressing still images. It throws out visually redundant information within the frame and stores the rest more efficiently.

Q: What is inter-frame coding?
A: Inter-frame coding exploits the similarities between consecutive frames to achieve high compression efficiency. It uses methods like block motion estimation and motion compensation to reduce data redundancy across frames.

Q: What are motion vectors?
A: Motion vectors are data that indicate how image blocks should be moved to closely match the next frames. They help in motion compensation during video compression.

Q: What are residual frames?
A: Residual frames are the differences between the actual frame and the motion-compensated frame. They have much less information than full reference frames and are highly compressible.

Q: What are some popular video codecs?
A: Some popular video codecs include H.264 (MPEG-4 AVC), H.265, and VP9. These codecs use sophisticated methods to balance compression efficiency and perceptual image quality.

Q: What is the future of video compression?
A: Video compression is an active area of research, and researchers are experimenting with machine learning models that have the potential to perform better than traditional block-based hybrid encoding standards, potentially leading to better optimization of perceptual image quality and file size.