UniCon: A Simple Approach to Unifying Diffusion-based Conditional Generation

TL;DR: The proposed UniCon enables diverse generation behavior in one model for a target image-condition pair.

Hightlights

  • UniCon adapts a pretrained image diffusion model for a specific image-condition pair, with minimal additional parameters (about 15%).
  • UniCon enables versatile generation behaviors in one model via flexible inference-time sampling schedules.
  • UniCon can be trained for both densely aligned or loosely correlated conditions.
  • Multiple UniCon models can be combined for multi-signal conditional generation.

One model for different tasks

One UniCon model supports diverse generation behavior at inference time.
All following results are from the same UniCon-Depth model.

Conditional generation with different UniCon models

UniCon models can be trained for various conditions, including densely aligned ones (depth, edge, pose) and loosely correlated ones (identity, and appearance).
Input
Output
Input
Output
Input
Output
Input
Output
Input
Output
Input
Output
Input
Output
Input
Output
Input
Output
Input
Output

Flexible conditional generation

UniCon models provide high flexibility in conditional generation, via flexible sampling schedules.

Combining multiple UniCon models

Multiple UniCon models can be combined for multi-signal conditional generation.

Method

UniCon is adapted from a pretrained image diffusion model with additional joint cross-attention modules and LoRA adapters.

Given a pair of image-condition inputs, our UniCon model processes them concurrently in two parallel branches. Features from two branches attend to each other in the injected joint cross-attention modules. The LoRA adapters apply the condition branch and the joint cross-attention modules.

The model is trained on image-condition pairs. During training, we separately sample timesteps for each input and compute loss over both branches.

BibTeX

@article{li2024unicon,
    title={A Simple Approach to Unifying Diffusion-based Conditional Generation},
    author={Li, Xirui and Herrmann, Charles and Chan, Kelvin CK and Li, Yinxiao and Sun, Deqing and Yang, Ming-Hsuan},
    booktitle={arXiv preprint arxiv:2410.11439},
    year={2024}
    }