This illusion is made in a really bizarre way

Introduction

A while back, I posed a challenge: is it possible to create an image that works on a special surface featuring "twisty squares" in both orientations? Many people accepted this challenge, especially on my Discord server. One of the community members, Hanlin, even developed a web interface to assist with the design process. However, despite numerous attempts, a meaningful solution appeared elusive.

Then, I received a remarkable email from Daniel Gang and a coinciding direct message on Discord from Ryan Berer. Both of them had utilized generative AI to find images that worked perfectly for the twisty squares challenge. They had employed AI tools like Midjourney and Stable Diffusion, but with a twist. It turns out crafting the right prompts wasn’t just about text input; Daniel and Ryan had taken a deeper approach by dissecting and tweaking the models internally.

To begin, I wanted to delve into how generative AI operates. Neither Daniel nor Ryan kept it simple; they both utilized diffusion models, the current state-of-the-art for this type of generative task. In essence, training a diffusion model involves adding noise to a collection of images. The model is trained to predict the noise that was added, and it is instructed to refine its parameters based on this feedback.

Initially, the model encounters pure random noise, gradually predicting and reducing noise to bring forth recognizable images. However, this gives no control over the outcome. For this, it requires an understanding of language, typically facilitated through a separate model that is adept in comprehension.

This is where CLIP comes in — a model that compares images with their corresponding textual descriptions. During training, it maps these two types of information into a shared vector space. This allows for meaningful cross-reference between the noise prediction model and the features derived from the text prompt, allowing the generation of a coherent image that aligns with the prompt.

However, the beautiful aspect of image generation using diffusion models lies in the gradual enhancements step by step. When given a text prompt, the process starts with pure noise, and through iterative refinement, creates a clearer image. This gradual approach opens fascinating opportunities for creative manipulations, such as coalescing two evolving images into one illusion. For instance, Ryan and Daniel paused their image generation to blend features from different images at various stages, ultimately leading to unique, ambiguous results, like a penguin that resembles a giraffe when viewed from different angles.

Additionally, several artists and researchers, including Matt Parker, are exploring similar techniques to innovate in fields like jigsaw puzzles and even 3D model generation. This method illustrates how audiences can interact with dimensions beyond traditional standards and challenges our understanding of perception in art and technology.

Both Ryan and Daniel achieved extraordinary results showcasing these illusions, highlighting how generative AI encompasses both human-like and entirely unique problem-solving capabilities. The coexistence of these aspects raises intriguing questions about the future of AI and creativity.

Keywords

Generative AI, diffusion models, twisty squares, CLIP, text prompts, image generation, optical illusions, iterative refinement, cross-attention.

FAQ

1. What is the twisty squares challenge?
The twisty squares challenge involves creating an image that appears meaningful when viewed in both orientations of a specially designed surface known as twisty squares.

2. How do generative AI models work?
Generative AI models, particularly diffusion models, start with random noise and iteratively reduce this noise by predicting and refining images based on input data, such as text prompts.

3. What role does the CLIP model play in image generation?
The CLIP model creates a shared vector space for image features and text meanings, enabling the integration of textual understanding into the image generation process.

4. Can you explain the process of blending images in generative AI?
By taking intermediate outputs of different prompts, such as a penguin and a giraffe, and averaging their pixel information, artists can create unique and unexpected visual illusions.

5. How does this technology relate to 3D model generation?
Similar techniques used for generating 2D images have been adapted to create 3D models, demonstrating how AI can produce spatial representations based on various visual perspectives.

This illusion is made in a really bizarre way

Introduction

Keywords

FAQ

One more thing