# CrossViT **Repository Path**: mirrors_ibm/CrossViT ## Basic Information - **Project Name**: CrossViT - **Description**: Official implementation of CrossViT. https://arxiv.org/abs/2103.14899 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2021-08-01 - **Last Updated**: 2026-03-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CrossViT This repository is the official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. [ArXiv](https://arxiv.org/abs/2103.14899) If you use the codes and models from this repo, please cite our work. Thanks! ``` @inproceedings{ chen2021crossvit, title={{CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification}}, author={Chun-Fu (Richard) Chen and Quanfu Fan and Rameswar Panda}, booktitle={International Conference on Computer Vision (ICCV)}, year={2021} } ``` ## Installation To install requirements: ```setup pip install -r requirements.txt ``` With conda: ``` conda create -n crossvit python=3.8 conda activate crossvit conda install pytorch=1.7.1 torchvision cudatoolkit=11.0 -c pytorch -c nvidia pip install -r requirements.txt ``` ## Data preparation Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision [`datasets.ImageFolder`](https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder), and the training and validation data is expected to be in the `train/` folder and `val` folder respectively: ``` /path/to/imagenet/ train/ class1/ img1.jpeg class2/ img2.jpeg val/ class1/ img3.jpeg class/2 img4.jpeg ``` ## Pretrained models We provide models trained on ImageNet1K. You can find models [here](https://github.com/IBM/CrossViT/releases/tag/weights-0.1). And you can load pretrained weights into models by add `--pretrained` flag. ## Training To train `crossvit_9_dagger_224` on ImageNet on a single node with 8 gpus for 300 epochs run: ```shell script python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model crossvit_9_dagger_224 --batch-size 256 --data-path /path/to/imagenet ``` Other model names can be found at [models/crossvit.py](models/crossvit.py). ## Multinode training Distributed training is available via Slurm and `submitit`: To train a `crossvit_9_dagger_224` model on ImageNet on 4 nodes with 8 gpus each for 300 epochs: ``` python run_with_submitit.py --nodes 4 --model crossvit_9_dagger_224 --data-path /path/to/imagenet --batch-size 128 --warmup-epochs 30 ``` Or you can start process on each machine maunally. E.g. 2 nodes, each with 8 gpus. Machine 0: ```shell script python -m torch.distributed.launch --nproc_per_node=8 --master_addr=MACHINE_0_IP --master_port=AVAILABLE_PORT --nnodes=2 --node_rank=0 main.py --model crossvit_9_dagger_224 --batch-size 256 --data-path /path/to/imagenet ``` Machine 1: ```shell script python -m torch.distributed.launch --nproc_per_node=8 --master_addr=MACHINE_0_IP --master_port=AVAILABLE_PORT --nnodes=2 --node_rank=1 main.py --model crossvit_9_dagger_224 --batch-size 256 --data-path /path/to/imagenet ``` Note that: some slurm configurations might need to be changed based on your cluster. ## Evaluation To evaluate a pretrained model on `crossvit_9_dagger_224`: ``` python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model crossvit_9_dagger_224 --batch-size 128 --data-path /path/to/imagenet --eval --pretrained ```