AI Generated Videos Are Actually Getting Out of Hand

With AI-generated videos becoming incredibly lifelike, we're not just facing the threat of not being able to distinguish real from fake—we're also open to being Rickrolled in the most inventive ways imaginable. Think you've got an awe-inspiring view of the Eiffel Tower? Bam, you're Rickrolled. Catching a glimpse of some serene clouds? Guess what’s below—still Rickrolled. Even familiar memes like the distracted boyfriend could covertly Rickroll you.

The Evolution of AI Video Generators

In February 2024, OpenAI announced Sora, a groundbreaking text-to-video generator that outshines previous models due to its ability to generate video purely from text descriptions. Previous generators often required reference videos or worked fundamentally differently, such as by needing a base video for transformation.

Sora's techniques were revolutionary but also established a new standard. This model works like watching a child learn to draw from scratch versus just tracing over something. While full-fledged text-to-video generation (like Sora) is more impressive, other techniques offer consistency. These involve mapping out a 3D environment first and then using AI to sync it up.

Enter Storyteller: The New Tool for Creatives

Storyteller, a sponsor of today's discussion, offers a toolkit that integrates these different methods for creating consistent and impressive videos. From developing 3D sandboxes to using built-in generators, props, and characters, Storyteller lets you fine-tune and control every element of your video. Creatives can direct actors, apply motion capture, and choose from various artistic styles.

Anyone interested in the evolving landscape of AI video generation should join the Storyteller community on Discord. The app is still in beta but offers an early sneak peek for those who sign up now.

The AI Video Generation Race

The release of Sora created a ripple effect, spurring numerous AI companies to announce their own text-to-video generators. However, some debate how representative OpenAI's shared demos are of Sora's actual capabilities, given that some newer demos were edited.

Since then, multiple noteworthy AI models have emerged:

Cing: Developed by Kuaishou, a Chinese equivalent of TikTok, Cing offers impressive video capabilities, especially in body movements and close-ups. Although Kuaishou's model focuses on Chinese cultural elements, like consistent generation of folks using chopsticks, it allows for up to two-minute videos at 30 FPS.
Luma Dream Machine: Released shortly after Cing, Luma focuses more on cinematic themes and offers features like keyframe selection and video extension. With free-tier generations limited, premium options are available starting at $ 30 per month.
Runway AI's Gen-3 Alpha: Known for producing highly realistic facial close-ups and various POV shots, Runway doesn’t yet offer starting frame selection. For $ 15 per month, you get access to its current features, albeit with some limitations.
OpenSora: An open-source attempt to replicate OpenAI's Sora, offering a detailed description of model training and capabilities. For those interested in learning about text-to-video models, OpenSora is a fantastic resource.

The Future and Challenges

Although these models are improving rapidly, they still face significant challenges:

Anatomy and Complex Interactions: While AI-generated videos are getting better at pouring liquids, they struggle with complex tasks like handwriting or detailed body movements.
Physics and Realism: To get closer to realistic world simulators, these AI models need to develop reasoning capabilities to better understand and replicate physical interactions.

The Creative Touch

From innovative music videos to meme mashups, AI-generated videos open up endless creative possibilities, despite their current imperfections. To keep up with the latest advancements, check out research breakdowns and newsletters focusing on AI.

Keywords

AI-generated videos
Text-to-video generation
OpenAI Sora
Storyteller toolkit
Cing
Luma Dream Machine
Runway AI Gen-3 Alpha
OpenSora
Anatomy challenges
AI realism

FAQ

Q: What is a text-to-video generator? A: A text-to-video generator creates video content directly from text descriptions without needing a reference video.

Q: What makes OpenAI’s Sora special? A: Sora generates videos purely from text input, setting a new standard in AI video generation for its comprehensiveness and ability to start from scratch.

Q: How does Storyteller enhance video creation? A: Storyteller provides a toolkit that enables precise control over video outputs, combining various 3D assets, animation functionalities, and built-in generators.

Q: Are AI-generated videos capable of realistic anatomy and complex interactions? A: Not yet. While improvements are being made, AI video generators still struggle with anatomy and detailed interactions like handwriting.

Q: What AI video generators are currently leading the market? A: Notable generators include OpenAI’s Sora, Kuaishou's Cing, Luma Dream Machine, Runway AI's Gen-3 Alpha, and OpenSora. Each has unique strengths and limitations.

Q: Can these AI generators be used for commercial purposes? A: Depending on the service, commercial licenses are available. For instance, Luma AI offers such an option for its premium users.

Q: What kind of creative applications are people exploring with these tools? A: Creatives are using AI video generators for everything from music videos to meme mashups, leveraging the unique and often surreal quality of AI-generated content.