💡 LumiNet

Abstract

We introduce LumiNet, a novel architecture that leverages generative models and latent intrinsic representations for effective lighting transfer.

Given a source image and a target lighting image, LumiNet synthesizes a relit version of the source scene that captures the target's lighting. Our approach makes two key contributions: a data curation strategy from the StyleGAN-based relighting model for our training, and a modified diffusion-based ControlNet that processes both latent intrinsic properties from the source image and latent extrinsic properties from the target image. We further improve lighting transfer through a learned adaptor (MLP) that injects the target's latent extrinsic properties via cross-attention and fine-tuning.

Unlike traditional ControlNet, which generates images with conditional maps from a single scene, LumiNet processes latent representations from two different images - preserving geometry and albedo from the source while transferring lighting characteristics from the target. Experiments demonstrate that our method successfully transfers complex lighting phenomena including specular highlights and indirect illumination across scenes with varying spatial layouts and materials, outperforming existing approaches on challenging indoor scenes using only images as input.

Lighting Zoo

Given multiple different target lights, LumiNet generates various lighting conditions in the same (real-world) scene. (Light transfer results are interactive — magnifier will appear)

Original image

Relit image

Architecture

Given two images (source image to be relit and target lighting condition), we first extract latent intrinsic image representations for both these images using the pretrained latent intrinsic model.

We then take the lighting (latent extrinsic) vector from the target lighting condition image, and the latent intrinsic feature from the source image to train a latent controlNet.

The latent ControNet construction needs to match the dimensionality of what the model expects. Our latent extrinsic vectors are only 16 dimensions -- so we map them to higher dimensions using MLP layers to match the size of the text embedding vectors. We use an empty string as our text input to focus purely on image-based lighting transfer.

We train the model with about 2500 unique images and their corresponding relit images.

Lighting Transfer Results

We present some of the relighting results here. Given a target light (top row) and a scene to be relit (second row), our method transfers complex indoor scene lighting phenomena—including direct illumination, specular highlights, cast shadows, inter-reflections, and other indirect effects—while maintaining scene geometry and albedo. (Light transfer results are interactive — magnifier will appear)

Target light

Original image

Relit image

Visual Comparison

We compare LumiNet with IC-Light-v2, RGB-X, and Latent-Intrinsic. Both RGB-X and IC-Light-v2 require text prompts to achieve relighting, where we use descriptions derived from the target lighting image (including actions like turning lights on/off, lamp placement, and scene type) as text prompts.

BibTeX

LumiNet:

@article{Xing2024luminet,
      title={LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting},
      author={Xing, Xiaoyan and Groh, Konrad and Karagolu, Sezer and Gevers, Theo and Bhattad, Anand},
      journal={arXiv preprint arXiv:2412.00177},
      year={2024}
    }

Latent Intrinsics:

@inproceedings{Zhang2024Latent,
    title={Latent Intrinsics Emerge from Training to Relight},
    author={Zhang, Xiao and Gao, William and Jain, Seemandhar and Maire, Michael and Forsyth, David and Bhattad, Anand},
    booktitle={NeurIPS},
    year={2024}
  }

💡 LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting

CVPR 2025

LumiNet transfers complex lighting conditions from a target image (top-right) to a source image (left), synthesizing a relit version of the source image (right) while preserving its main geometry and albedo.