Testing out Stable Diffusion 2.0 Using Google Colab

It's only a just few months after Stable Diffusion AI was released open source to public and it was not long ago when version 1.5 was released open source on public. Suddenly SD 2.0 was announced a few days ago. I want to try it.

I don't know how I got into the AI Art stuff. I think I saw it on Twitter and an online friend got me into it. But I do find it intriguing and slightly addictive! AI prompting to generate infinite imagination visuals have been quite entertaining.

I have been using DiffusionBee app running on Apple M1 machine and also few other online alternative like AI Background Generator, MAGE Space, PlaygroundAI, DreamStudio, Dream Texture for Blender etc. And recently I also use iOS "Draw Things" AI app. Which is amazing to have this AI generator tool inside your pocket. It keeps the iPhone warm however, so I use it sparingly.

The speed of AI Art generation needs to be under 1 minute or faster to make it more dynamic and a fun experience. Back then, it took me 10-15 minutes to generate an image. But recent apps able to generate faster like 4-6 minutes. Then suddenly you can run AI image generations at 30-60 seconds on the same M1 machine. It's incredible speed. I know it could be faster using expensive GPU, but still I am quite happy with the speed.

Now let's try version 2.0 with Google Colab, with luck you get GPU machine running online.

STABLE DIFFUSION 2.0 Google Colab:

I am currently running this Stable Diffusion 2.0 online using Google Colab, I am using this notebook:

https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/stable-diffusion-v2-webui-colab.ipynb

I am testing txt2img, text to image prompting for the AI to generate images output. I heard the resulting output is different than before. Lots of filters have been applied. There are some cool features like depth map generation and being able to mask based on depth. I have not tried that yet.

For now, txt2img first.

"fat pig anthropomorphic wearing cop uniform eating a donut outside of dunkin donut"

My wording must have been wrong...

"a portrait of chubby anthropomorphic pig, wearing cop uniform, eating a donut outside of supermarket"

So far I was not happy with some results.

Then I remember we should actually use image size of 768 x 768 instead of 512 x 512:

The same prompt using Stable Diffusion 1.5:

Let's try something different:

"cane toad riding skateboard on a busy street"

"cane toad riding a skateboard on a busy street, oil painting by affandi"

I am testing the same prompt using Diffusion Bee and the basic 1.5 model.

It seems like with 2.0 we lost the ability to "replicate" famous artist style. I have to experiment further, but maybe there are easier ways later to train custom model from users.

"anime art of toad soldier riding a horse on a busy street, sketch by otomo katsuhiro"

The wording needs to be adjusted... I do have something in mind.

"anime art of anthropomorphic toad knight holding a sword riding a horse on a busy street, sketch by otomo katsuhiro, RPG Art, 4K"

But not everything worked out as expected for now, I guess...

"a parallel world where only cats evolved, crowd of cats wearing various clothings, hyper details, rich colors, photograph"

Mind you, using the same prompt using almight Midjourney give you this: (this is from artist DaMoxy):

But back to Stable Diffusion 2.0 with better prompting:

"a parallel world where only cats evolved, crowd of anthropomorphic cats wearing various clothings, hyper details, rich colors, photograph, fantasy, oil painting"

It's actually quite artistic and aesthetic!

Above is using the default Stable Diffusion 1.5.

CONCLUSION FOR NOW

I will have to spend a bit more time and study more on Stable Diffusion 2.0, maybe it's not "all bad", in fact we take everything for granted. The fact that the AI "understand" your text prompt and able to generate visuals infinitely in all kind of "style" and composition, is fascinating.

Search This Blog

Blender Sushi AI