InstantID uses InsightFace to detect, crop and extract a face embedding from the reference face. The embedding is then used with the IP-adpater to control image generation. This part is very similar to the IP-Adapter Face ID. (View Highlight)
The combination of using IP-Adapter Face ID and ControlNet enables copying and styling the reference image with high fidelity. (View Highlight)
To put a face, or anything, in Stable Diffusion, you can train a checkpoint or a LoRA models. This approach usually gives the best result but it is time-consuming and requires some skill in training models. (View Highlight)
Running InstantID requires close to 20 GB of VRAM in my test. If you run into memory issue, you can try the Low VRAM setting in ControlNet, and the SDXL memory optimization options. (View Highlight)