GSHeadRelight: Fast Relightability for 3D Gaussian Head Synthesis

Henglei Lv1,2, Bailin Deng3, Jianzhu Guo4, Xiaoqiang Liu4, Pengfei Wan4 Di Zhang4 Lin Gao1,2*
1Institute of Computing Technology, Chinese Academy of Sciences
2University of Chinese Academy of Sciences, Beijing, China
3Cardiff University
4Kuaishou Technology
SIGGRAPH 2025 Conference Track
Code (pytorch) Code (jittor) Paper
Intro Image
GSHeadRelight incorporates an efficient yet effective lighting model into generative 3D Gaussian and can synthesize high-quality relightable 3D Gaussian heads that allow for novel view synthesis and relighting given any HDRI environment maps. Our method does not require expensive light stage data and achieves real-time rendering at 240 FPS, surpassing the previous 3D-aware portrait relighting research by at least 12 times.

Abstract

Relighting and novel view synthesis of human portraits are essential in applications such as portrait photography, virtual reality (VR), and augmented reality (AR). Despite recent progress, 3D-aware portrait relighting remains challenging due to the demands for photorealistic rendering, real-time performance, and generalization to unseen subjects. Existing works either rely on supervision from limited and expensive light stage captured data or produce suboptimal results. Moreover, many works are based on generative NeRFs, which suffer from poor 3D consistency and low real-time performance. We resort to recent progress on generative 3D Gaussians and design a lighting model based on a unified neural radiance transfer representation, which responds linearly to incident light. Using only in-the-wild images, our method achieves state-of-the-art relighting results and a significantly faster rendering speed (x12) compared to previous 3D-aware portrait relighting research.

Framework

Methods

We incorporate a lighting model based on unified radiance transfer into a generative 3D Gaussian framework. The generator \(G\) takes in Gaussian noise and camera pose as condition and generates for each Gaussian an embedding \(\mathbf{x}\), which is then linearly transformed into the albedo \(\mathbf{\rho}\) and geometry attributes \(\{\mathbf{g}\}\) including position, scale, rotation and opacity. A light-weight decoder conditioned on view direction transforms \(\mathbf{x}\) into radiance transfer coefficients \(\mathbf{t}\), which are used to compute the unified light transport with the light condition. Color is obtained by multiplying the light transport component with the albedo. The image is then rendered by standard splatting and sent to the discriminator \(D\), which takes both camera pose and light condition as input. Mapping modules are omitted for simplicity. Gaussian noise and camera pose are represented as icons, and light condition as a diffuse sphere.

Visual Results

Visual Comparison

Methods