ad
ad
Topview AI logo

So What? AI Image Generation Bake-off

Education


Introduction

Welcome to our deep dive into the fascinating world of generative AI image generation! In this article, we explore the ongoing advancements in AI and how different image generation tools are influencing the landscape. Our focus today is on a playful yet insightful bake-off between popular image generation models such as Dolly, Midjourney, Google Gemini, and Meta AI. Join us as we unpack how biases affect AI-generated images and how to improve your prompting skills for better results.

The Journey Begins

As generative AI hit mainstream conversations a couple of years ago, many people began experimenting with image generation. Initial forays often revealed common issues, such as images of people having an unrealistic number of fingers or unexpected features. Creating effective prompts for images requires a different approach compared to text prompts, demanding a keen understanding of the nuances of each AI tool.

First, we examined an image generated by Dolly 3, which served as a case study for our conversation. The prompt provided requested a "professional blonde woman with black rectangular glasses wearing a green hoodie and typing on a Samsung smartphone." However, upon closer inspection, it became apparent that the AI made assumptions about beauty standards, resulting in an image laden with societal biases—such as unrealistic beauty ideals reflective of plastic surgery trends.

Understanding Bias in AI

It's crucial to identify and understand the biases inherent in AI models that stem from the training data they consume. For instance, our exploration of various professional demographics underscored clear gender and racial biases when generating imagery. The inadequacy of our initial two-word prompts begged the need for in-depth, precise prompts that encapsulate not just the subject but also attributes like setting, lighting, and color schemes.

Armed with insights from over a hundred articles and various YouTube discussions on generative AI, we compiled templates for excellent prompts. For successful image generation across systems, prompts should contain:

  1. Image Command: Specify what you want.
  2. Image Type: Describe if it's a photo or illustration.
  3. Main Subject: Identify the focal point of the image.
  4. Descriptive Modifiers: Include adjectives and specific traits.
  5. Setting: Where does the scene take place?
  6. Style Cues: Photo-realistic, cartoonish, etc.
  7. Lighting and Composition: Specify how the image should be lit.
  8. Negative Prompts: Highlight what should NOT appear in the image.

The Bake-Off

Now, let's transition to the bake-off. We initiated a test across four systems: ChatGPT, Midjourney, Google Gemini, and Meta AI, with our original prompt. Each generated its unique interpretation:

  • ChatGPT: Overall, it produced a vague and unadorned image of a woman, indicating a misunderstanding of the prompt inviting more specific detailing.

  • Midjourney: Presented four variations, all rather generic and rather reflective of traditional beauty standards tied to the assumptions surrounding 'blonde' women.

  • Google Gemini: Delivered an image that retained a baseline fidelity to the prompt yet still carried forward biases typical of the AI.

  • Meta AI: Attempted to create an image with a more lifelike appearance, but it, too, fell into the trap of bias and generalizations.

Throughout this experimentation, the question arose: How can we enhance prompts to mitigate these biases? The answer is specificity—it requires targeting the characteristics that describe the desired image, like age, ethnicity, body type, and more nuanced behavioral traits.

By utilizing custom GPT models—which can generate optimized prompts based on your data—we can significantly elevate the results. Moving through this iterative process helps refine outputs to better match the intended subject.

Conclusion

The key takeaway is that the quality of AI-generated images is heavily reliant on the data provided in prompts. By addressing the inherent biases of the models and crafting well-defined prompts, users can facilitate improvements in the accuracy and representation of AI-generated imagery. These emerging tools hold great promise for various applications, provided the underlying issues are adequately addressed.

Keyword

AI, Image Generation, Prompts, Bias, Midjourney, Google Gemini, Dolly, Meta AI, Generative AI, Custom GPT, Professional Portrait.

FAQ

Q1: What is generative AI image generation?
A: Generative AI image generation uses algorithms to create images based on textual prompts, combining learned characteristics from extensive datasets.

Q2: Why are biases common in AI-generated images?
A: Biases arise from the training data used by AI models, which reflect societal standards and stereotypes. This can lead to skewed representations in the outputs.

Q3: What can improve AI-generated image prompts?
A: More detailed and specific prompts—incorporating aspects like age, body type, setting, and other features—can help alleviate biases and generate more representative images.

Q4: How can I get better results from AI image generation tools?
A: Utilizing prompting templates and iterative feedback can enhance your prompting and ultimately lead to better results in generated imagery.

Q5: What is a custom GPT model?
A: A custom GPT model is a tailored version of a generative AI that can create prompts or outputs based on user-defined parameters, improving specificity and targeting in results.