We are delighted to announce a new method that can essentially make Stable Diffusion and SDXL faster, as if they had been distilled using the LCM process! How does it sound to run any SDXL model in about 1 second instead of 7 on a 3090, or 10x faster on Mac? Read on for details! (View Highlight)
So, what’s the trick? For latent consistency distillation, each model needs to be distilled separately. The core idea with LCM LoRA is to train just a small number of adapters, known as LoRA layers, instead of the full model. The resulting LoRAs can then be applied to any fine-tuned version of the model without having to distil them separately (View Highlight)
Select an available teacher model from the Hub. For example, you can use SDXL (base), or any fine-tuned or dreamboothed version you like. (View Highlight)
Train a LCM LoRA on the model. LoRA is a type of performance-efficient fine-tuning, or PEFT, that is much cheaper to accomplish than full model fine-tuning. (View Highlight)
Use the LoRA with any SDXL diffusion model and the LCM scheduler; bingo! You get high-quality inference in just a few steps. (View Highlight)
Fast inference of Stable Diffusion and SDXL enables new use-cases and workflows. To name a few:
• Accessibility: generative tools can be used effectively by more people, even if they don’t have access to the latest hardware.
• Faster iteration: get more images and multiple variants in a fraction of the time! This is great for artists and researchers; whether for personal or commercial use.
• Production workloads may be possible on different accelerators, including CPUs.
• Cheaper image generation services. (View Highlight)
To gauge the speed difference we are talking about, generating a single 1024x1024 image on an M1 Mac with SDXL (base) takes about a minute. Using the LCM LoRA, we get great results in just ~6s (4 steps). This is an order of magnitude faster, and not having to wait for results is a game-changer. Using a 4090, we get almost instant response (less than 1s). This unlocks the use of SDXL in applications where real-time events are a requirement. (View Highlight)
The version of diffusers released today makes it very easy to use LCM LoRAs: (View Highlight)
As expected, using just 1 step produces an approximate shape without discernible features and lacking texture. However, results quickly improve, and they are usually very satisfactory in just 4 to 6 steps. Personally, I find the 8-step image in the previous test to be a bit too saturated and “cartoony” for my taste, so I’d probably choose between the ones with 5 and 6 steps in this example. Generation is so fast that you can create a bunch of different variants using just 4 steps, and then select the ones you like and iterate using a couple more steps and refined prompts as necessary.
(View Highlight)
Note that in the previous examples we used a guidance_scale of 1, which effectively disables it. This works well for most prompts, and it’s fastest, but ignores negative prompts. You can also explore using negative prompts by providing a guidance scale between 1 and 2 – we found that larger values don’t work. (View Highlight)
As you can see, images in this example are pretty much useless until ~20 steps (second row), and quality still increases niteceably with more steps. The details in the final image are amazing, but it took 50 steps to get there. (View Highlight)
New highlights added November 13, 2023 at 12:44 PM
This technique also works for any other fine-tuned SDXL or Stable Diffusion model. To demonstrate, let’s see how to run inference on collage-diffusion, a model fine-tuned from Stable Diffusion v1.5 using Dreambooth. (View Highlight)
The integration of LCM in diffusers makes it possible to take advantage of many features and workflows that are part of the diffusers toolbox. For example:
• Out of the box mps support for Macs with Apple Silicon.
• Memory and performance optimizations like flash attention or torch.compile().
• Additional memory saving strategies for low-RAM environments, including model offload.
• Workflows like ControlNet or image-to-image.
• Training and fine-tuning scripts.
(View Highlight)
Using the diffusers + PEFT integration, you can combine LCM LoRAs with regular SDXL LoRAs, giving them the superpower to run LCM inference in only 4 steps. (View Highlight)
How to Train LCM Models and LoRAs
As part of the diffusers release today, we are providing training and fine-tuning scripts developed in collaboration with the LCM team authors. They allow users to:
• Perform full-model distillation of Stable Diffusion or SDXL models on large datasets such as Laion.
• Train LCM LoRAs, which is a much easier process. As we’ve shown in this post, it also makes it possible to run fast inference with Stable Diffusion, without having to go through distillation training. (View Highlight)