Quick Start: Wan Model Training#

This section walks through the Wan Pre-train pipeline end-to-end.


Wan2.2 I2V-A14B Training Pipeline#

0. Resource Preparation#

Before starting, download the required model weights, tokenizer, and datasets. All downloads use HuggingFace. Install the CLI first:

pip install "huggingface_hub[cli]"

0.1 Download Model Weights#

hf download Wan-AI/Wan2.2-I2V-A14B --local-dir ./Wan-AI/Wan2.2-I2V-A14B

Note: This model requires approximately 126 GB of disk space (high-noise model ~57 GB + low-noise model ~57 GB + T5 encoder ~11.4 GB + VAE ~0.5 GB). Download may take a while depending on your network.

0.2 Download Tokenizer#

The UMT5 tokenizer is included in the model weights downloaded above (./Wan-AI/Wan2.2-I2V-A14B/google/umt5-xxl/).

0.3 Prepare Dataset#

There is no standard public video dataset for quick-start. Prepare your own video data in the metadata.csv format described in Section 1. Below is a minimal example for testing:

mkdir -p ./data/dataset/train
# Place your .mp4 files in ./data/dataset/train/

cat > ./data/dataset/metadata.csv << 'EOF'
video,prompt
train/sample.mp4,"A sample video description"
EOF

1. Preprocess Training Data#

Expected dataset example#

dataset
├── metadata.csv
└── train
    ├── EGO_1.mp4
    ├── EGO_2.mp4
    ├── EGO_3.mp4

metadata.csv example

video,prompt
train/EGO_1.mp4,"places the bag of clothes on the floor\nPlan:\n pick up the bag of clothes. Put the bag of clothes on the floor.\nactions :\n1. pick up(bag of clothes)\n2. put on(bag of clothes, floor)"

Steps#

Step-1 Install dependencies (model weights were already downloaded in Section 0.1)

pip install diffsynth==1.1.8

Step-2 Process the input

MODEL_BASE=./Wan-AI/Wan2.2-I2V-A14B  # should match --local-dir in Section 0.1
MODEL_T5=${MODEL_BASE}/models_t5_umt5-xxl-enc-bf16.pth
MODEL_VAE=${MODEL_BASE}/Wan2.1_VAE.pth
# Script location: examples/wan/wan_preprocess.py in LoongForge repo
accelerate launch wan_preprocess.py \
  --dataset_base_path <your_dataset> \
  --dataset_metadata_path <your_dataset>/metadata.csv \
  --height 480 --width 832 --num_frames 49 \
  --model_paths "${MODEL_T5},${MODEL_VAE}" \
  --tokenizer_local_path "${MODEL_BASE}/google/umt5-xxl" \
  --output_path ./data/preprocessed \
  --max_timestep_boundary 0.358 --min_timestep_boundary 0

Output#

Each .pth file contains the following three keys:

  • input_latents – VAE latent of the whole video

  • y – first-frame VAE latent concatenated with a visibility mask

  • context – text encoder embedding

(High-/low-noise tensors are NOT separated; LoongForge adds noise online later.)


2. Convert Checkpoints (HF → Megatron)#

Inside LoongForge repo:

Step-1 Generate random Megatron checkpoints with correct PP split (needed as scaffold).

  • Pick an empty folder, e.g. <base>/wan2.2/hg2mcore_pp4/high_noise/Megatron_Random

  • In examples/wan/pretrain_wan2.2_i2v_a14b.sh set

    • HIGH_NOISE_CHECKPOINT_PATH → above folder

    • LOW_NOISE_CHECKPOINT_PATH → analogous

    • --train-iters 5

    • --save-interval 2

  • Run once – you will obtain iter_0000002 folders.

Step-2 Convert HF weights into Megatron format
Edit examples/wan/convert_wan2.2.sh (section hg2mcore):

  • --load_pathiter_0000002 produced in Step-1

  • --save_path → final release folder, e.g. <base>/high_noise/Megatron_Release/

  • --checkpoint_path → original HF .safetensors directory

  • --pp 4 (or 8)

Run

bash examples/wan/convert_wan2.2.sh hg2mcore

Repeat for low-noise model.


3. Launch Training#

Recommended single-node split: PP=4, CP=2
Multi-node – scale by data parallelism:

dp = (NNODES × GPUS_PER_NODE) / (pp × cp)

Symbol

Meaning

dp

Data Parallel degree

pp

Pipeline Parallel degree

cp

Context Parallel degree

Step-1 Tune examples/wan/pretrain_wan2.2_i2v_a14b.sh

  • HIGH_NOISE_CHECKPOINT_PATH → path to high-noise Megatron checkpoint (from Section 2)

  • LOW_NOISE_CHECKPOINT_PATH → path to low-noise Megatron checkpoint (from Section 2)

  • DATASET_PATH → output path from Section 1 (e.g. ./data/preprocessed)

  • --pipeline-model-parallel-size 4

  • --context-parallel-size 2

  • --context-parallel-ulysses-degree 2

Step-2 Start

  • Single-node:

    bash examples/wan/pretrain_wan2.2_i2v_a14b.sh
    
  • Multi-node: execute the same script on every node – cluster env-vars (MASTER_ADDR, NODE_RANK …) are picked up automatically.


4. Export Checkpoints (Megatron → HF)#

Edit examples/wan/convert_wan2.2.sh (section mcore2hg):

  • --load_path → Megatron checkpoint after training

  • --save_path → target HF folder

  • --checkpoint_path → original HF checkpoint directory (used for reading model structure only)

  • --pp 4

Run

bash examples/wan/convert_wan2.2.sh mcore2hg