Every Type of AI "Generated" Video Ever, Explained

Introduction

Artificial Intelligence is transforming the realm of video creation, but not all AI-generated videos are created equal. The distinction between AI-processed and AI-generated videos can be quite nuanced. Mislabeling them can invite confusion similar to calling a PhD holder in English just “doctor.” Since many people view AI as a black box, misunderstandings about its capabilities are common. This article aims to clarify different types of AI-generated videos, sorted by the opacity of their respective black boxes, so you can better appreciate these technologies.

1. Text-Based AI Generated Videos

The first category on this spectrum is text-based AI generated videos. These represent the biggest black box of them all. You provide text input, and the AI generates a video from that input, resembling a leap of faith. The control you have is minimal, which allows for a high degree of uncertainty in the output. This type sits far on the left of our spectrum, where the degree of AI generation is highest.

A noteworthy example of this is OpenAI's proposed text-to-video generator, Sora, which aspires to simulate the world using video data. However, voice/sound quality and visual realism can suffer as the AI tries to generate a coherent outcome. Adding initial images can help, but their effectiveness often diminishes over time, resulting in bizarre visuals.

2. Video-to-Video Generation

Slightly to the right on the spectrum is the video-to-video generation method. This technique takes a base video and uses AI to modify it frame by frame. This process improves over-text-to-video as each frame is interconnected, mitigating odd stutters and inconsistencies. Domo AI is a notable example, simplifying video-to-video style transfer that allows users to transform footage into various artistic styles (like anime).

Despite simplifying the pipeline, the foundational technique remains chaotic. Originally, it evolved from text-to-image models, painting over individual frames without consistency. While still more controlled than text-based methods, it does demand substantial AI effort.

3. Face Swapping

Next on our journey is face swapping, a specialized AI technology that focuses predominantly on faces. While not as generative as video-to-video methods due to its narrow focus, it excels at morphing chosen faces onto target bodies. Many AI filters work hand-in-hand with face-swapping algorithms. Innovations like body swaps have showcased hilarious and memorable content online.

4. Specific Avatars and Manipulation

As we move slightly rightward, we encounter the AI avatar or digital avatars technology. This has often drawn public scrutiny due to ethical implications. It primarily learns to replicate one individual's likeness, which means it can offer high-quality representations without extensive training on various faces.

Finally, we arrive at face manipulation, akin to puppeting images. These technologies create animations by extracting and mimicking facial movements, often through driving videos. Audio-driven animations have emerged recently, adding depth to face and body manipulation techniques.

5. Practical Applications and Hybrid Pipelines

The boundaries of these technologies can blur as they are combined into more complex pipelines, facilitating innovative applications. For instance, a human gesture can be intertwined with AI avatars or generative facial expressions. Ultimately, navigating this spectrum reveals the diverse range, capabilities, and limitations of AI in video production.

Keywords

AI-generated videos
Text-based generation
Video-to-video generation
Face swap
Digital avatars
Face manipulation
Animation techniques
AI technologies

FAQ

What are the different types of AI-generated videos?

The primary types discussed include text-based AI generated videos, video-to-video generation, face-swapping, digital avatars, and face manipulation.

What is a text-based AI generated video?

Text-based AI generated videos utilize text input to create video content but operate with a significant degree of uncertainty, making it less controllable.

What advantages does video-to-video generation offer?

Video-to-video generation maintains temporal cohesion between frames, offering smoother transitions than earlier text-to-image models.

How is face swapping different from video-to-video generation?

Face swapping specifically targets faces and focuses on replacing one face with another, unlike video-to-video generation that processes the entire frame.

What is face manipulation?

Face manipulation animates images like puppets by synthesizing movements from driving videos or audio, effectively bringing static images to life.