# Navformer **Repository Path**: tj1652045/Navformer ## Basic Information - **Project Name**: Navformer - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-28 - **Last Updated**: 2026-04-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Navformer Navformer is the end-to-end model training and evaluation component of [WorldEngine](https://github.com/OpenDriveLab/WorldEngine), built on MMDetection3D, the nuPlan / OpenScene dataset and NAVSIM. It supports a full training loop: **train → open-loop evaluation → rare case extraction → RL fine-tuning**, with **VADv2** and **HydraMDP** as the supported model architectures. --- ## Table of Contents - [System Requirements](#system-requirements) - [Installation](#installation) - [Environment Variables](#environment-variables) - [Data](#data) - [Quick Reference](#quick-reference) - [Training](#training) - [Evaluation](#evaluation) - [Rare Case Extraction](#rare-case-extraction) - [Configuration](#configuration) - [Model Architectures](#model-architectures) - [Advanced Training](#advanced-training) - [Troubleshooting](#troubleshooting) - [Performance Optimization](#performance-optimization) --- ## System Requirements **Minimum:** - GPU: NVIDIA GPU with 8 GB VRAM (e.g., RTX 2080) - RAM: 32 GB - Storage: 500 GB SSD - CPU: 8 cores **Recommended:** - GPU: NVIDIA GPU with 24 GB+ VRAM (e.g., RTX 3090, A100) - RAM: 64 GB+ - Storage: 5 TB+ SSD - CPU: 16+ cores **Software:** - OS: Linux (Ubuntu 20.04 / 22.04) - CUDA: 11.8 - Conda / Miniconda --- ## Installation ### 1. Create Conda Environment ```bash conda create --name navformer python=3.9 -y conda activate navformer ``` ### 2. Install PyTorch ```bash pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 \ --index-url https://download.pytorch.org/whl/cu118 ``` Verify: ```bash python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')" # Expected: PyTorch: 2.0.1+cu118, CUDA: True ``` ### 3. Install MMCV (build from source) MMCV must be built from source to include custom CUDA operators: ```bash git clone https://github.com/open-mmlab/mmcv.git cd mmcv git checkout v1.6.2 # Build with custom ops (takes 10–15 minutes) # Downgrade setuptools to ~75.1.0 if you encounter build errors MMCV_WITH_OPS=1 pip install -v -e . python .dev_scripts/check_installation.py cd .. ``` Verify: ```bash python -c "import mmcv; print(f'MMCV: {mmcv.__version__}')" # Expected: MMCV: 1.6.2 ``` ### 4. Install OpenMMLab Ecosystem ```bash pip install mmcls==0.25.0 pip install mmdet==2.25.3 pip install mmdet3d==1.0.0rc6 pip install mmsegmentation==0.29.1 ``` ### 5. Install Navformer Dependencies ```bash pip install -r requirements.txt pip install shapely==2.0.4 ``` ### 6. Verify Installation ```bash python -c " import torch, mmcv, mmdet, mmdet3d, numpy, hydra print('All Navformer dependencies OK') print(f'PyTorch {torch.__version__}') print(f'MMCV {mmcv.__version__}') print(f'MMDetection3D {mmdet3d.__version__}') print(f'CUDA available: {torch.cuda.is_available()}') " ``` --- ## Environment Variables Navformer relies on the [NAVSIM devkit v1.1](https://github.com/autonomousvision/navsim): ```bash git clone -b v1.1 https://github.com/autonomousvision/navsim.git ``` Add the following to `~/.bashrc` or `~/.zshrc`: ```bash export NAVSIM_DEVKIT_ROOT="/path/to/navsim" export NAVFORMER_ROOT="/path/to/Navformer" export NUPLAN_MAPS_ROOT="/path/to/nuplan/maps" PYTHONPATH=$NAVFORMER_ROOT:$NAVSIM_DEVKIT_ROOT:$PYTHONPATH ``` Apply: ```bash source ~/.bashrc # or source ~/.zshrc ``` --- ## Data ### Directory Layout ``` Navformer/ ├── data/ │ ├── raw/ # nuPlan and OpenScene datasets │ └── alg_engine/ # Navformer-specific data └── experiments/ # Experiment outputs (auto-created) ``` ### Download Navformer reuses the **[OpenDriveLab/WorldEngine](https://huggingface.co/datasets/OpenDriveLab/WorldEngine)** dataset on Hugging Face, which contains merged annotation PKLs, PDM caches, model checkpoints, and K-means vocab files. - **Hugging Face**: ```bash curl -LsSf https://hf.co/cli/install.sh | bash hf download OpenDriveLab/WorldEngine --repo-type dataset --local-dir /path/to/Navformer ``` - **ModelScope** (recommended for users in China): ```bash pip install modelscope modelscope download --dataset OpenDriveLab/WorldEngine ``` ### Raw Data (`data/raw/`) ``` data/raw/ ├── nuplan/ │ └── dataset/ │ ├── maps/ # HD maps (required) │ │ ├── nuplan-maps-v1.0.json │ │ ├── us-nv-las-vegas-strip/ │ │ ├── us-ma-boston/ │ │ ├── us-pa-pittsburgh-hazelwood/ │ │ └── sg-one-north/ │ └── nuplan-v1.1/ │ ├── sensor_blobs/ # Camera images and LiDAR │ └── splits/ │ └── openscene-v1.1/ ├── sensor_blobs/ │ ├── trainval/ │ └── test/ └── meta_datas/ ├── trainval/ └── test/ ``` Use symlinks to point at your existing downloads: ```bash cd data/raw ln -s /path/to/nuplan nuplan ln -s /path/to/openscene-v1.1 openscene-v1.1 ``` ### Navformer Data (`data/alg_engine/`) ``` data/alg_engine/ ├── ckpts/ # Pre-trained model checkpoints ├── merged_infos_navformer/ │ ├── nuplan_openscene_navtrain.pkl │ └── nuplan_openscene_navtest.pkl ├── pdms_cache/ # Pre-computed PDM metrics cache │ ├── pdm_8192_gt_cache_navtrain.pkl │ └── pdm_8192_gt_cache_navtest.pkl └── test_8192_kmeans.npy # K-means clustering for PDM vocab ``` --- ## Quick Reference ```bash conda activate navformer # Training (8 GPUs) ./scripts/e2e_dist_train.sh [resume_checkpoint] # Open-loop navtest evaluation ./scripts/e2e_dist_eval.sh # Full train set evaluation bash scripts/e2e_dist_eval_navtrain.sh # Rare case extraction python scripts/rare_case_sampling_by_pdms.py \ --pdm-result \ --base-split \ --output-dir ``` --- ## Training ### Training from Scratch ```bash conda activate navformer # Train VADv2 (8 GPUs) ./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8 ``` **Arguments:** 1. `` — configuration file path 2. `` — number of GPUs 3. `[resume_checkpoint]` (optional) — checkpoint to resume from ### Resume Training ```bash ./scripts/e2e_dist_train.sh \ configs/navformer/e2e_vadv2.py \ 8 \ experiments/navformer/e2e_vadv2/latest.pth ``` If `latest.pth` exists in `experiments/navformer/e2e_vadv2/`, training auto-resumes when you omit the third argument. ### Monitor Training ```bash # Watch training log tail -f experiments/navformer/e2e_vadv2/logs/train.* # TensorBoard tensorboard --logdir experiments/navformer/e2e_vadv2/tf_logs ``` **Key metrics:** - `loss` — total training loss (should decrease) - `loss_planning` — planning loss - `loss_track` — tracking loss - `ade_4s` — average displacement error at 4 s - `fde_4s` — final displacement error at 4 s ### Training Output ``` experiments/navformer/e2e_vadv2/ ├── e2e_vadv2.py # config backup ├── logs/ │ └── train.* ├── epoch_1.pth ├── ... ├── epoch_20.pth └── latest.pth # symlink to latest checkpoint ``` --- ## Evaluation ### Open-Loop Evaluation #### Full Test Set ```bash conda activate navformer ./scripts/e2e_dist_eval.sh \ configs/navformer/e2e_vadv2.py \ experiments/navformer/e2e_vadv2/epoch_20.pth \ 8 ``` Output: `experiments/navformer/e2e_vadv2/navtest.csv` #### Rare Navtest Cases Only ```bash ./scripts/e2e_dist_eval_navtest_failures.sh \ configs/navformer/e2e_vadv2.py \ experiments/navformer/e2e_vadv2/epoch_20.pth \ 8 ``` Output: `experiments/navformer/e2e_vadv2/navtest_failures.csv` #### Full Train Set Required before [Rare Case Extraction](#rare-case-extraction). Evaluates on the full navtrain split: ```bash bash scripts/e2e_dist_eval_navtrain.sh \ configs/navformer/e2e_vadv2.py \ experiments/navformer/e2e_vadv2/epoch_20.pth \ 8 ``` Output: `experiments/navformer/e2e_vadv2/navtrain.csv` #### Evaluation Metrics ```csv token,ade_4s,fde_4s,no_at_fault_collisions,drivable_area_compliance,ego_progress,comfort,score ``` | Metric | Description | Direction | |--------|-------------|-----------| | `ade_4s` | Average trajectory error over 4 s (m) | lower | | `fde_4s` | Final position error at 4 s (m) | lower | | `no_at_fault_collisions` | Collision avoidance rate (0–1) | higher | | `drivable_area_compliance` | Stay in drivable area (0–1) | higher | | `ego_progress` | Route completion (0–1) | higher | | `comfort` | Comfort metric (0–1) | higher | | `score` | Overall PDM score (0–1) | higher | --- ## Rare Case Extraction Extract failure scenarios from training-set evaluation for targeted fine-tuning. **Prerequisite:** complete a [Full Train Set Evaluation](#full-train-set) first. ### Basic Extraction ```bash conda activate navformer python scripts/rare_case_sampling_by_pdms.py \ --pdm-result experiments/navformer/e2e_vadv2/navtrain.csv \ --base-split configs/navsim_splits/navtrain_split/navtrain_50pct.yaml \ --output-dir configs/navsim_splits/navtrain_split/e2e_vadv2_rare ``` **Output:** ``` configs/navsim_splits/navtrain_split/e2e_vadv2_rare/ ├── navtrain_50pct_collision.yaml # collision scenarios ├── navtrain_50pct_off_road.yaml # off-road scenarios └── navtrain_50pct_ep_1pct.yaml # low ego-progress (bottom 1%) ``` ### Custom Thresholds Edit `scripts/rare_case_sampling_by_pdms.py`: ```python # Change collision threshold collision_scenarios = df[df['no_at_fault_collisions'] < 0.95] # default 1.0 # Change ego-progress percentile ep_threshold = df['ego_progress'].quantile(0.05) # default 0.01 (1% → 5%) ``` --- ## Configuration Configs follow the MMDetection3D hierarchical pattern: ``` configs/ ├── _base_/ │ └── default_runtime.py ├── navformer/ │ ├── e2e_vadv2.py │ ├── e2e_hydramdp.py │ └── track_map_nuplan_r50_navtrain.py └── navsim_splits/ ├── navtrain_split/ │ ├── navtrain.yaml │ ├── navtrain_50pct.yaml │ └── e2e_vadv2_rare/ │ ├── navtrain_50pct_collision.yaml │ ├── navtrain_50pct_off_road.yaml │ └── navtrain_50pct_ep_1pct.yaml └── navtest_split/ ├── navtest.yaml └── navtest_failures.yaml ``` ### Key Config Parameters ```python model = dict( type='VADv2', # or 'HydraMDP' num_query=900, planning_steps=8, ) bev_h_, bev_w_ = 200, 200 patch_size = [102.4, 102.4] # physical range in meters input_modality = dict( use_lidar=False, use_camera=True, # 8 cameras use_radar=False, use_external=True, # CAN bus ) total_epochs = 20 optimizer = dict(type='AdamW', lr=2e-4, weight_decay=0.01) data = dict( samples_per_gpu=1, workers_per_gpu=4, train=dict( ann_file='merged_infos_navformer/nuplan_openscene_navtrain.pkl', scenario_filter='configs/navsim_splits/navtrain_split/navtrain_50pct.yaml', ), val=dict( ann_file='merged_infos_navformer/nuplan_openscene_navtest.pkl', scenario_filter='configs/navsim_splits/navtest_split/navtest.yaml', ), ) ``` ### Runtime Overrides ```bash ./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8 \ --cfg-options optimizer.lr=1e-4 total_epochs=30 data.samples_per_gpu=2 ``` --- ## Model Architectures | Architecture | Config | Strengths | |---|---|---| | **VADv2** (default) | `configs/navformer/e2e_vadv2.py` | Fast inference, general driving | | **HydraMDP** | `configs/navformer/e2e_hydramdp.py` | Multi-modal planning, safety-critical | --- ## Advanced Training ### Multi-Node Training ```bash # Node 0 (master) export MASTER_ADDR=192.168.1.100 export MASTER_PORT=28567 export WORLD_SIZE=16 export RANK=0 ./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8 # Node 1 (worker) export MASTER_ADDR=192.168.1.100 export MASTER_PORT=28567 export WORLD_SIZE=16 export RANK=8 ./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8 ``` ### Mixed Precision ```python # in config fp16 = dict(loss_scale='dynamic') ``` ### Gradient Accumulation ```python # effective batch = samples_per_gpu * num_gpus * gradient_accumulation_steps runner = dict(max_epochs=20, gradient_accumulation_steps=4) ``` --- ## Troubleshooting **CUDA out of memory:** ```bash # Reduce batch size: data.samples_per_gpu = 1 # Lower BEV resolution: bev_h_, bev_w_ = 150, 150 # Enable gradient checkpointing: model.img_backbone.with_cp = True ``` **Training loss not decreasing:** ```bash grep "load checkpoint" experiments/navformer/*/logs/train.* ./scripts/e2e_dist_train.sh ... --cfg-options optimizer.lr=1e-4 ``` **Evaluation hangs:** ```bash ps aux | grep python pkill -f "test.py" ./scripts/e2e_dist_eval.sh ... 4 # try fewer GPUs ``` **`ModuleNotFoundError: No module named mmdet3d`:** ```bash conda activate navformer python -c "import mmcv; print(mmcv.__version__)" pip uninstall mmdet3d -y && pip install mmdet3d==1.0.0rc6 ``` **Corrupted checkpoint:** ```bash # Use a previous epoch ./scripts/e2e_dist_train.sh ... experiments/navformer/e2e_vadv2/epoch_18.pth ``` --- ## Performance Optimization **Training speed:** - `data.workers_per_gpu = 8` (if CPU/RAM allows) - Store data on NVMe SSD - `fp16 = dict(loss_scale='dynamic')` - `data.persistent_workers = True` **Memory:** - `data.samples_per_gpu = 1` - `bev_h_, bev_w_ = 150, 150` - `model.img_backbone.with_cp = True` **Multi-node:** - Use homogeneous GPU types across nodes - InfiniBand for inter-node communication - Shared NFS/Lustre for data loading