TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

By Thu-MlDecember 26, 2025Hacker News: Front Page

This repository provides the official implementation of TurboDiffusion , a video generation acceleration framework that can speed up end-to-end diffusion generation by TurboDiffusion primarily uses SageAttention , SLA (Sparse-Linear Attention) for attention acceleration, and rCM for timestep distillation. Paper: TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Note : the checkpoints and paper are not finalized, and will be updated later to improve quality. Original, E2E Time: 184s TurboDiffusion, E2E Time: 1.9s 5-second video generated by Wan-2.1-T2V-1.3B-480P on a single RTX 5090 . Model Name Checkpoint Link Best Resolution TurboWan2.2-I2V-A14B-720P TurboWan2.1-T2V-1.3B-480P Huggingface Model TurboWan2.1-T2V-14B-480P Huggingface Model TurboWan2.1-T2V-14B-720P Huggingface Model Note: All checkpoints support generating videos at 480p or 720p. The "Best Resolution" column indicates the resolution at which the model provides the best video quality. Base environment : python>=3.9 , torch>=2.7.0 . torch==2.8.0 is recommended, as higher versions may cause OOM. Install TurboDiffusion by pip: conda create -n turbodiffusion python=3.12 conda activate turbodiffusion pip install turbodiffusion --no-build-isolation Or compile from source: git clone https://github.com/thu-ml/TurboDiffusion.git cd TurboDiffusion git submodule update --init --recursive pip install -e . --no-build-isolation To enable SageSLA, a fast SLA forward pass based on SageAttention, install SpargeAttn first: pip install git+https://github.com/thu-ml/SpargeAttn.git --no-build-isolation For GPUs with more than 40GB of GPU memory, e.g., H100, please use the unquantized checkpoints (without -quant) and remove --quant_linear from the command. For RTX 5090, RTX 4090, or similar GPUs, please use the quantized checkpoints (with -quant) and add --quant_linear in the command.) Download the VAE ( applicable for both Wan2.1 and Wan2.2 ) and umT5 text encoder checkpoints: mkdir checkpoints cd checkpoints wget https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/resolve/main/Wan2.1_VAE.pth wget https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth Download our quantized model checkpoints (For RTX 5090 or similar GPUs): # For Wan2.1-T2V-1.3B wget https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-1.3B-480P/resolve/main/TurboWan2.1-T2V-1.3B-480P-quant.pth # For Wan2.2-I2V-14B wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-high-720P-quant.pth wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-low-720P-quant.pth Or download our unquantized model checkpoints (For H100 or similar GPUs): # For Wan2.1-T2V-1.3B wget https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-1.3B-480P/resolve/main/TurboWan2.1-T2V-1.3B-480P.pth # For Wan2.2-I2V-14B wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-high-720P.pth wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-low-720P.pth Use the inference script for the T2V models: export PYTHONPATH=turbodiffusion # Arguments: # --dit_path Path to the finetuned TurboDiffusion checkpoint # --model Model to use: Wan2.1-1.3B or Wan2.1-14B (default: Wan2.1-1.3B) # --num_samples Number of videos to generate (default: 1) # --num_steps Sampling steps, 1-4 (default: 4) # --sigma_max Initial sigma for rCM (default: 80); larger choices (e.g., 1600) reduce diversity but may enhance quality # --vae_path Path to Wan2.1 VAE (default: checkpoints/Wan2.1_VAE.pth) # --text_encoder_path Path to umT5 text encoder (default: checkpoints/models_t5_umt5-xxl-enc-bf16.pth) # --num_frames Number of frames to generate (default: 81) # --prompt Text prompt for video generation # --resolution Output resolution: "480p" or "720p" (default: 480p) # --aspect_ratio Aspect ratio in W:H format (default: 16:9) # --seed Random seed for reproducibility (default: 0) # --save_path Output file path including extension (default: output/generated_video.mp4) # --attention_type Attention module to use: original, sla or sagesla (default: sagesla) # --sla_topk Top-k ratio for SLA/SageSLA attention (default: 0.1), we recommend using 0.15 for better video quality # --quant_linear Enable quantization for linear layers, pass this if using a quantized checkpoint # --default_norm Use the original LayerNorm and RMSNorm of Wan models python turbodiffusion/inference/wan2.1_t2v_infer.py \ --model Wan2.1-1.3B...

Preview: ~500 words

Continue reading at Hacker News

Read Full Article

Read on Your E-Reader

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

More from Hacker News: Front Page