“The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices.” (View Highlight)
I’m a software developer without any deep knowledge on AI and recently I tried to train my own control net. How hard can it be, right? It turned to be surprisingly easy once you remove certain pitfalls for the setup and I’m super thrilled how a few hours of training with consumer hardware on 50k images emerges into a new functionality for Stable Diffusion. With this setup guide I want to encourage more people to try it out and maybe come up with your own ideas for a control net. (View Highlight)
In this tutorial we are going to train a control net on white-gray-black images with the idea to guide Stable Diffusion to light and dark areas to generate those squint illusion images or stylized QR codes. To do this we setup a few things, download an image dataset, convert these images to white-gray-black and let the GPU train on 50k images for over 6h. We use SD 1.5 with fp16 because it is faster but the basic setup should apply to SD 2.x and SDXL as well. Please note I just started out too and I have no idea what I’m doing but it works™… so please don’t ask me any detailed questions about control nets! (View Highlight)
Remap images to white-gray-black and train the control net on them so afterwards we can reverse the process and guide Stable Diffusion generation with grayscale images (View Highlight)
How many images do you need for a working control net? ⇒ At least 25k for a proof of concept but more than 50k for something useable, and 10x-100x for something great (Canny edge was trained on 3M images) (View Highlight)