Exploring Image-to-Image with Flux 0.1 Schnell: A Deep Dive into the Latest Update in ComfyUI!

Introduction

Welcome back! Flux 0.1 Schnell has just received a significant update, introducing a new image-to-image functionality. To give you a little perspective, the Flux model had day-one support within ComfyUI following its release, and now, in less than 48 hours, we have the ability to perform image-to-image transformations.

I appreciate your patience as I share these updates; there’s a lot happening! If you missed it, I previously released a detailed video on Flux 0.1, where I explained the model and guided you through the installation process in ComfyUI. In today’s article, we'll focus on exploring the image-to-image capabilities of this new model.

Understanding the Workflow

The basic workflow for the image-to-image application is outlined below:

Image Input: An input image is loaded.
Image Transformation: It undergoes an image-to-image workflow.
Output Variation: A variation of the original image is produced.

A key feature of this update is a denoising factor that controls the extent of change applied to the input image. For this demonstration, I used the ISal model with fp8, along with the Dual CLIP loader functioning on the same format.

Workflow Breakdown

The load image node loads the input image.
The process then feeds into a VAE encoder, which outputs a latent representation of the image.
This latent image goes into a sampler custom advance, driving the transformation.

In ComfyUI’s basic scheduler, you’ll find a denoising factor, which I’ve set to 0.75. I noticed that with this setting, the transformations were minimal—especially since I was attempting to convert a realistic image into an anime style.

Notes on Variability

The ISal model permits variations of 1 to 4 steps, which limits the transformation potential within the set range. Therefore, the image updates were minor. In contrast, a workflow that another user, C Duru, created showcased the use of the development model, which can increase the number of steps to 20. This led to a more defined transformation, matching an anime style output from a realistic input image.

Overall, this rendition of the image-to-image functionality is quite promising, especially seeing the pace of updates so soon after the release. I am eager to see what other features may come, such as support for ControlNet or IP adapter integration.

Performance Metrics

I wanted to share some additional information regarding the performance of the models. The text-to-image workflow deployed using the Flux 0.1 ISal model took approximately 10 to 11 minutes on average. However, the image-to-image functionality took more than double that time due to the increased processing required.

Feedback from the previous video indicated that even users with 64 GB of memory faced challenges running fp16 models smoothly. I invite you to share your experiences and any insights regarding performance with these updates.

Until next time, happy experimenting!

Keyword

Flux 0.1 Schnell
Image-to-Image
ComfyUI
Denoising Factor
VAE Encoder
ISal Model
Text-to-Image
Development Model

FAQ

1. What is the new feature introduced in Flux 0.1 Schnell? The new feature is the image-to-image functionality, allowing users to transform input images into variations through a defined workflow.

2. How does the denoising factor affect image transformations? The denoising factor controls the degree of change applied to the input image, with higher values resulting in more significant transformations.

3. What model was used in the experiments? The ISal model was utilized for this demonstration, and the results were processed using dual CLIP loaders with fp8 formatting.

4. How do the number of steps impact the output quality? Increasing the number of steps allows for more detailed transformations. For instance, a workflow with 20 steps showed notably better results compared to only four steps.

5. What were the performance metrics for the image-to-image workflow? The image-to-image workflow took more than twice the time compared to text-to-image, highlighting the increased processing requirements for transformations.