Instruct 3D-to-3D
High-fidelity text-guided conversion and editing of 3D scenes using iterative diffusion updates.
High-fidelity 3D mesh generation from single images in under 45 seconds
One-2-3-45 represents a significant architectural leap in 3D generative AI, specifically designed to solve the latency and consistency issues found in earlier Score Distillation Sampling (SDS) methods. Developed by researchers at UC San Diego, the system utilizes a feed-forward neural network that leverages the 2D knowledge of large-scale diffusion models (specifically Zero123) to generate multi-view images of a single object. These views are then processed through a cost volume-based 3D reconstruction module. Unlike optimization-based approaches that can take hours, One-2-3-45 completes the lifting of a 2D image into a full 3D mesh in approximately 45 seconds. By 2026, the architecture has become a benchmark for real-time spatial computing, providing the foundational logic for rapid asset creation in XR environments and game development pipelines. Its technical superiority lies in its ability to maintain high geometry fidelity and multi-view consistency without the typical 'Janus problem' common in early-stage 3D generators. The system is highly scalable, supporting deployment on consumer-grade GPUs with at least 24GB of VRAM, making it a favorite for local development and private enterprise deployment.
Uses a 3D cost volume to aggregate features from multi-view images for precise geometric localization.
High-fidelity text-guided conversion and editing of 3D scenes using iterative diffusion updates.
High-Fidelity Shading-Guided 3D Asset Generation from Sparse 2D Inputs
High-Quality Single Image to 3D Generation using 2D and 3D Diffusion Priors
Edit 3D scenes and NeRFs with natural language instructions while maintaining multi-view consistency.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
A direct mapping from images to 3D geometry without iterative per-object optimization.
Ensures that the generated multi-view images align perfectly in 3D space before mesh extraction.
Utilizes Signed Distance Functions to ensure the resulting mesh is watertight and manifold.
Deep integration with viewpoint-conditioned diffusion models for enhanced texture realism.
Internal pre-processing pipeline to isolate the subject for clean 3D extraction.
Post-processing step that optimizes vertex count while preserving visual detail.
Creating 3D placeholders manually takes hours for environment artists.
Registry Updated:2/7/2026
Retailers need thousands of 3D models of products for 'view in room' features.
Converting 2D historical diagrams into 3D tactile objects for the visually impaired.