Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization

Liyuan Ma1, Xueji Fang1,2, Guo-jun Qi1
1MAPLE Lab, Westlake University, 2Zhejiang University
Nerfies: Deformable Neural Radiance Fields

Abstract

Image customization involves learning the subject from provided concept images and generating it within textual contexts, typically yielding alterations of attributes such as style or background.

Prevailing methods primarily rely on fine-tuning technique, wherein a unified latent embedding is employed to characterize various concept attributes. However, the attribute entanglement renders customized result challenging to mitigate the influence of subject-irrelevant attributes (e.g., style and background). To overcome these issues, we propose Equilibrated Diffusion, an innovative method that achieves equilibrated image customization by decoupling entangled concept attributes from a frequency-aware perspective, thus harmonizing textual and visual consistency.

Unlike conventional approaches that employ a shared latent embedding and tuning process to learn concept, our Equilibrated Diffusion draws inspiration from the correlation between high- and low-frequency components with image style and content, decomposing concept accordingly in the frequency domain. Through independently optimizing concept embeddings in the frequency domain, the denoising model not only enriches its comprehension of style attribute irrelevant to subject identity but also inherently augments its aptitude for accommodating novel stylized descriptions. Furthermore, by combining different frequency embeddings, our model retains the spatially original customization capability. We further design a diffusion process guided by subject masks to alleviate the influence of background attribute, thereby strengthening text alignment. To ensure subject-related information consistency, Residual Reference Attention (RRA) is incorporated into the denoising model of spatial attention computation, effectively preserving structural details. Experimental results demonstrate that Equilibrated Diffusion surpasses other competitors with better subject consistency while closely adhering to text descriptions, thus validating the superiority of our approach.

Model

Comparison

BibTeX

@inproceedings{
  ma2024equilibrated,
  title={Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization},
  author={Liyuan Ma and Xueji Fang and Guo-Jun Qi},
  booktitle={ACM Multimedia 2024},
  year={2024},
  url={https://openreview.net/forum?id=ERuypCHYvX}
  }