DreamVolume
High-Fidelity Shading-Guided 3D Asset Generation from Sparse 2D Inputs

High-fidelity text-guided conversion and editing of 3D scenes using iterative diffusion updates.
Instruct 3D-to-3D is a pioneering framework designed for the precise manipulation of 3D environments through natural language instructions. Unlike traditional 3D generative models that create assets from scratch, this system specializes in the '3D-to-3D' task: taking an existing 3D scene—typically represented as a Neural Radiance Field (NeRF) or 3D Gaussian Splat (3DGS)—and modifying it according to text prompts (e.g., 'Make it look like a snowy winter day'). Technically, the architecture utilizes a pre-trained image-to-image diffusion model (like InstructPix2Pix) and applies it to multiple views of the 3D scene. It employs an Iterative Dataset Update (IDU) method, where the underlying 3D representation is optimized against the progressively edited 2D views, ensuring global 3D consistency and high-frequency detail preservation. By 2026, this technology has become a cornerstone in digital twin management and virtual production, allowing architects and game designers to rapidly prototype stylistic variations of real-world scans without manual re-modeling. Its ability to maintain structural integrity while altering semantic properties makes it superior to simple texture swapping techniques.
A mechanism that incrementally updates the 3D scene by alternating between updating the 3D representation and updating the training images using 2D diffusion.
High-Fidelity Shading-Guided 3D Asset Generation from Sparse 2D Inputs
High-Quality Single Image to 3D Generation using 2D and 3D Diffusion Priors
Edit 3D scenes and NeRFs with natural language instructions while maintaining multi-view consistency.
Edit 3D scenes with text instructions using Iterative Dataset Updates and Diffusion Models.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
Synchronizes the InstructPix2Pix output across different camera angles to prevent flickering in 3D space.
Compatible with both Neural Radiance Fields and the faster 3D Gaussian Splatting backends.
Leverages the latent space of diffusion models to understand which parts of a 3D scene to change based on the prompt.
Allows users to adjust the strength of the text instruction relative to the original 3D structure.
A secondary pass that enhances textures after the initial geometry modification.
Allows the application of styles learned from one 3D scene to be applied to another.
Manually changing the materials and lighting of a 3D building scan is time-consuming.
Registry Updated:2/7/2026
Export 3D model.
Product photoshoots in different environments are expensive.
Creating 'Damaged' or 'Post-Apocalyptic' versions of game assets requires manual sculpting.