AUTOMATIC LIP-SYNCING

Hey guys, it's me, Carrie Kaleidoscope Hoarder. Today, I'm super excited to share my latest project, an automated avatar lip-sync tool that takes my existing program a step further. The goal was to create a beautifully animated avatar that can lip-sync to given audio files automatically, and I think I've achieved just that. Let’s dive into how it all works.

An Overview of the Original Project

As a quick reminder, my last project automatically generated YouTube videos using annotated transcripts and an audio file as inputs. The program would then generate all the visuals. The test was simple: I’d talk about dogs, and the video would show pictures of dogs.

New Goals

This time, I wanted to extend the project in three main areas:

Aesthetic Improvements - Creating a new, animated avatar.
Functional Enhancements - Implementing automated lip-syncing and expression changes.
Legal Compliance - No more unlicensed Google Images.

The Animated Avatar

Creating a fully animated avatar from scratch can take weeks, if not months. However, I wanted to produce a ten-minute animation within a much shorter time frame. Here’s the step-by-step process I used:

Drawing the Avatar: I created four different emotions, each with five poses. Each pose has three levels of blinking, resulting in a wide array of expressions.
Lip-Sync: Using 22 mouth assets from my show "Battle for Dream Island"—these cover various phonemes and consonants required for speech.

Functional Enhancements

The core enhancement here was to overhaul how the program synchronizes audio and visual elements. Here's the logic:

Old vs. New Timestamp Combining: Previously, the program used a tool called Gentle to get timestamps for each word. Now, it also captures phoneme timestamps.
Creating Timetables: I created five timetables—for phoneme timestamps, pose changes, emotion changes, topic discussions, and paragraph transitions.
Frame Drawing: The system uses these timetables to draw the right mouth shape, pose, and expression for each frame, about 18,000 in all for a ten-minute video.

Legal Compliance

Many people pointed out that my approach of using Google images could be legally problematic. I addressed this with a temporary solution of drawing quick, rudimentary images. Ultimately, I intend to find a more automated, legal solution.

Results

After several days of coding, the project successfully automated the animation process. The tool generates a video from just an audio file and an annotated transcript.

Announcing Lazy KH

Finally, for those wondering about my new channel, it's called Lazy KH, where I'll be uploading content generated by this new automatic tool.

Learning and Next Steps

I explored tools like Anime Studio and Adobe Animate but decided to write my own code for more control. This allowed for better synchronization of the avatar’s emotions and poses.

Thank You!

Thank you for sticking around. If you're curious about AI or want to learn something new, check out Brilliant for some fantastic courses.

Keywords

Automated Avatar
Lip-Sync
Animated Video
Coding
Emotions
Timestamps
Legal Compliance
YouTube
AI Tools
Brilliant

FAQ

Q: What inspired you to create an automated lip-sync tool?
A: The desire to speed up the animation process and add more expressiveness to my videos inspired this project.

Q: How does the lip-sync function work?
A: The tool captures phoneme timestamps from the audio, then matches these to pre-drawn mouth shapes to create realistic lip-sync.

Q: What are the legal issues with using Google Images, and how did you address them?
A: Using unlicensed images can lead to copyright issues. I addressed this by creating a manual "human image requester" system as a temporary solution.

Q: Why didn't you use existing software for the lip-sync?
A: Writing my own code gives me full control over synchronization, allowing for better integration of the avatar’s emotions and poses.

Q: What's next for Lazy KH?
A: I'll continue to refine the tool and upload more automated videos to the Lazy KH channel.