Skip to main content
Ctrl+K

LoongForge

Get Started

  • Installation
  • Support Model
  • Parallelism Strategies & Optimization Guide

LLM Training

  • Quick Start: LLM Pre-training
  • Quick Start: LLM SFT
  • Checkpoint Conversion for LLM
  • Advanced Features
    • FP8 Training
    • Adaptive FP8 Training (Selective FP8)
    • MoE All-to-All Overlap
    • Optimizer Support
    • Fused Linear Cross Entropy
    • Mcore-Bridge: Online HF Checkpoint Loading & Saving
    • LoRA Feature Usage Guide

VLM Training

  • Quick Start: VLM Pre-training
  • Quick Start: VLM SFT
  • VLM Dataset Conversion
  • Model Checkpoint Conversion for VLM
  • Advanced Features
    • Offline Packing
    • DP Load-Balancing
    • Model Combination
    • FP8 Training for VLM
    • Heterogeneous Parallel
    • ViT Encoder DP Load-Balancing

VLA Training

  • Quick Start: Pi0.5 Training

Diffusion Training

  • Quick Start: Wan Model Training
  • Wan2.2 Packing Training

KunLun Training

  • Kunlunxin P800 README
  • Installation on Kunlunxin P800
  • Quick Start: LLM Model Pretrain Training on Kunlunxin P800
  • Quick Start: LLM Model SFT Training on Kunlunxin P800
  • Quick Start: VLM Model SFT Training on Kunlunxin P800
  • Quick Start: VLA Model SFT Training on Kunlunxin P800

Developer Guide

  • Support New Model

More

  • Contribute
  • License and File Header Guidelines
  • FAQs
  • .md

LLM Advanced Features

LLM Advanced Features#

LoongForge provides a rich set of optimizations for large language model training.

  • FP8 Training
  • Adaptive FP8 Training (Selective FP8)
  • MoE All-to-All Overlap
  • Optimizer Support
  • Fused Linear Cross Entropy
  • Mcore-Bridge: Online HF Checkpoint Loading & Saving
  • LoRA Feature Usage Guide

previous

Checkpoint Conversion for LLM

next

FP8 Training

By LoongForge

© Copyright 2026, LoongForge.