Overview
Custom Diffusion is a method for fine-tuning text-to-image diffusion models like Stable Diffusion with a few images (4-20) of a new concept. It achieves fast training (~6 minutes on 2 A100 GPUs) by fine-tuning only a subset of model parameters—key and value projection matrices in the cross-attention layers. This reduces storage per concept to 75MB. The method allows combining multiple concepts, such as a new object and artistic style. It uses regularization images to prevent overfitting. The tool provides scripts for single and multi-concept fine-tuning and merging fine-tuned models. Custom Diffusion supports training and inference through the Diffusers library and offers a dataset of 101 concepts with evaluation prompts.
