# VSLA-CLIP **Repository Path**: teslatasy/VSLA-CLIP ## Basic Information - **Project Name**: VSLA-CLIP - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-09-06 - **Last Updated**: 2024-09-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach([PDF]()) ### Installation ``` conda create -n vslaclip python=3.8 conda activate vslaclip conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch pip install yacs pip install timm pip install scikit-image pip install tqdm pip install ftfy pip install regex ``` ### Training For example, if you want to run for the ls-vid, you need to modify the config file to ``` DATASETS: NAMES: ('lsvid') ROOT_DIR: ('your_dataset_dir') OUTPUT_DIR: 'your_output_dir' ``` Then, if you want to use weight of [VIFI-CLIP](https://github.com/muzairkhattak/ViFi-CLIP) to initialize model, you need to down the weight form [link](https://github.com/muzairkhattak/ViFi-CLIP) and modify config file as: ``` MODEL: VIFI_WEIGHT : 'your_dataset_dir/vifi_weight.pth' USE_VIFI_WEIGHT : True ``` If you want to run FT-CLIP (fine tune image encoder): ``` CUDA_VISIBLE_DEVICES=0 python train_fine_tune.py --config_file configs/ft/vit_ft.yml ``` if you want to run VSLA-CLIP: ``` CUDA_VISIBLE_DEVICES=0 python train_reidadapter.py --config_file configs/adapter/vit_adapter.yml ``` ### Evaluation For example, if you want to test VSLA-CLIP for LS-VID ``` CUDA_VISIBLE_DEVICES=0 python test.py --config_file 'your_config_file' TEST.WEIGHT 'your_trained_checkpoints_path/ViT-B-16_120.pth' ``` ### Weights | Dataset | LS-VID | MARS | iLIDS | G2A | |------------|-----------|------|------------------------------------------------------------------------------------------------|-------| | VSLA-CLIP‡ | [model](https://drive.google.com/drive/folders/1Wh4AJ9g59lZO_6trKEIloaLqKdU_j6ps?usp=sharing) | [model](https://drive.google.com/drive/folders/1Wh4AJ9g59lZO_6trKEIloaLqKdU_j6ps?usp=sharing) | [model](https://drive.google.com/drive/folders/1Wh4AJ9g59lZO_6trKEIloaLqKdU_j6ps?usp=sharing) | [model](https://drive.google.com/drive/folders/1Wh4AJ9g59lZO_6trKEIloaLqKdU_j6ps?usp=sharing) | ### Citation ``` @inproceedings{vsla-clip, author = {S. Zhang and W. Luo and D. Cheng and Q. Yang and L. Ran and Y. Xing and Y. Zhang}, title = {Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach}, year = {2024}, booktitle = {ECCV} } ``` ### Acknowledgement Codebase from [CLIP-ReID](https://github.com/Syliz517/CLIP-ReID), [TransReID](https://github.com/damo-cv/TransReID), [CLIP](https://github.com/openai/CLIP), and [CoOp](https://github.com/KaiyangZhou/CoOp).