# cad

**Repository Path**: whitezwh/cad

## Basic Information

- **Project Name**: cad
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-07
- **Last Updated**: 2025-02-07

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Content-Adaptive Downsampling in Convolutional Neural Networks
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Framework](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?&logo=PyTorch&logoColor=white)](https://pytorch.org/)

This is the official repository accompanying the CVPR Workshop paper:

[R. Hesse](https://robinhesse.github.io/), [S. Schaub-Meyer](https://schaubsi.github.io/), and [S. Roth](https://www.visinf.tu-darmstadt.de/visual_inference/people_vi/stefan_roth.en.jsp). **Content-Adaptive Downsampling in Convolutional Neural Networks**. _CVPRW, The 6th Efficient Deep Learning for Computer Vision (ECV) Workshop_, 2023.

[Paper](https://openaccess.thecvf.com/content/CVPR2023W/ECV/papers/Hesse_Content-Adaptive_Downsampling_in_Convolutional_Neural_Networks_CVPRW_2023_paper.pdf) | [Preprint (arXiv)](https://arxiv.org/abs/2305.09504) | [Video](https://www.youtube.com/watch?v=E4iJPpWaJso) | [Poster](https://github.com/visinf/cad/blob/main/poster.jpeg) | [Supplemental](https://openaccess.thecvf.com/content/CVPR2023W/ECV/supplemental/Hesse_Content-Adaptive_Downsampling_in_CVPRW_2023_supplemental.pdf)

![Poster](https://github.com/visinf/cad/blob/main/poster.jpeg)

## Semantic Segmentation (Sec. 4.2)

### Pretrained Models

| Model       | mIoU Cityscapes  | Download |
| :---        |     :---:       | :---     |
| ResNet101+DeepLabv3 (OS=16)     | 0.762           | [best_deeplabv3_resnet101_cityscapes_os16_seed1.pth](https://download.visinf.tu-darmstadt.de/data/2023-cvpr-hesse-cad/best_deeplabv3_resnet101_cityscapes_os16_seed1.pth)| 
| ResNet101+DeepLabv3 (OS=8)   | 0.776           | [best_deeplabv3_resnet101_cityscapes_os8_seed1.pth](https://download.visinf.tu-darmstadt.de/data/2023-cvpr-hesse-cad/best_deeplabv3_resnet101_cityscapes_os8_seed1.pth) | 
| ResNet101+DeepLabv3 edge (OS=8->16)       | 0.773           | [best_deeplabv3_batch_ap_resnet101_cityscapes_os8_modeedges_os16till8_seed2_trimapwidth11_threshold0.15.pth](https://download.visinf.tu-darmstadt.de/data/2023-cvpr-hesse-cad/best_deeplabv3_batch_ap_resnet101_cityscapes_os8_modeedges_os16till8_seed2_trimapwidth11_threshold0.15.pth) | 
| ResNet101+DeepLabv3 learned (OS=8->16)     | 0.775           | [best_deeplabv3_ad_resnet101_cityscapes_modeend2end_seed0_default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared.pth](https://download.visinf.tu-darmstadt.de/data/2023-cvpr-hesse-cad/best_deeplabv3_ad_resnet101_cityscapes_modeend2end_seed0_default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared.pth) | 

### Available architectures

Specify the model architecture with '--model ARCH_NAME' and set the output stride using '--output_stride OUTPUT_STRIDE'. We here show example runs for **ResNet101+DeepLabv3**.

### Reproduce

#### 1. Install the required packages

Current channels:
- https://conda.anaconda.org/conda-forge/linux-64
- https://conda.anaconda.org/conda-forge/noarch
- https://conda.anaconda.org/pypi/linux-64
- https://conda.anaconda.org/pypi/noarch
- https://conda.anaconda.org/anaconda/linux-64
- https://conda.anaconda.org/anaconda/noarch
- https://conda.anaconda.org/pytorch/linux-64
- https://conda.anaconda.org/pytorch/noarch
- https://repo.anaconda.com/pkgs/main/linux-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/linux-64
- https://repo.anaconda.com/pkgs/r/noarch

conda create --name adaptive_downsampling --file requirements.txt
conda activate adaptive_downsampling

#### 2. Download cityscapes and extract it to 'datasets/cityscapes'

```
/datasets
  /cityscapes
  	/gtFine
		/leftImg8bit
```

#### 3. Train your models on Cityscapes

**For baseline models in Sec 4.2:**


```bash
python main.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --output_stride 16 --data_root /datasets/cityscapes --random_seed 0

python main.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --output_stride 8 --data_root /datasets/cityscapes --random_seed 0
```

**For content-adaptive downsampling models in Sec 4.2:**

Adaptive downsampling with edge mask:

```bash
python main.py --model deeplabv3_batch_ap_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --output_stride 8 --data_root /datasets/cityscapes --trimap_width 11 --pooling_mask_mode edges_os16till8 --pooling_mask_edge_detection_treshold [0.15, 0.35, 0.95] --random_seed 0 --exp_name trimapwidth11_threshold[0.15, 0.35, 0.95]
```

Adaptive downsampling with learned mask:

```bash
python main_e2e_train.py --model deeplabv3_ad_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --data_root /datasets/cityscapes --random_seed 0 --exp_name default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared --val_interval 100 --tau 1 --low_res_active 0.5

For evaluation:
python main_e2e_eval.py --model deeplabv3_ad_resnet101 --dataset cityscapes --gpu_id 0 --crop_size 768 --data_root /datasets/cityscapes --random_seed 0 --tau 1 --ckpt ./best_deeplabv3_ad_resnet101_cityscapes_modeend2end_seed0_default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared.pth
```

#### 4. Evaluate your models

To evaluate your models run the respective training call (main.py) with the parameters ```--test_only``` and ```--ckpt```. 

#### 5. Get number of multiply-adds

Regular downsampling

```bash
python main_flops.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0 --output_stride [8,16] --data_root /datasets/cityscapes
```

Adaptive downsampling edge mask

```bash
python main_flops.py --model deeplabv3_ap_resnet101 --dataset cityscapes --gpu_id 0 --output_stride 8 --output_stride_from_trained 8 --data_root /datasets/cityscapes --pooling_mask_mode edges_os16till8 --trimap_width 11 --pooling_mask_edge_detection_treshold [0.15, 0.35, 0.95]
```

Adaptive downsampling learned mask

```bash
python main_e2e_flops.py --model deeplabv3_ad_resnet101 --dataset cityscapes --gpu_id 0 --crop_size 768 --data_root /datasets/cityscapes --random_seed 0 --ckpt ./best_deeplabv3_ad_resnet101_cityscapes_modeend2end_seed0_default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared.pth
```

## Keypoints (Sec. 4.3)

This code is built on top of the official implementation of the following paper:

```text
"D2-Net: A Trainable CNN for Joint Detection and Description of Local Features".
M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. CVPR 2019.
```

[Paper on arXiv](https://arxiv.org/abs/1905.03561), [Project page](https://dsmn.ml/publications/d2-net.html)

### Downloading the models and datasets

For instruction on downloading the dataset please see the 'hpatches_sequences' folder

The model weights can be downloaded by running:

```bash
mkdir models
wget https://dusmanu.com/files/d2-net/d2_tf.pth -O models/d2_tf.pth
```

### Install the required packages

see ../segmentation

additionally install opencv:
pip install opencv-python

### Feature extraction

`extract_features.py` can be used to extract D2 features for a given list of images. 

Regular downsampling:
```bash
python extract_features.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --model_file models/d2_tf.pth --output_extension .sift_d2net_os[1,2,4,8]_512kpts --output_stride [1,2,4,8] --nr_keypoints 512
```

Adaptive downsampling (example for dilations 25 51 51):
```bash
python extract_features.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --model_file models/d2_tf.pth --output_extension .sift_apd2net_os1_512kpts_dils_25_51_51 --output_stride 1 --nr_keypoints 512 --des APD2Net --dilations 25 51 51
```

Adaptive downsampling (example for dilations 0 0 31):
```bash
python extract_features.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --model_file models/d2_tf.pth --output_extension .sift_apd2net_os4_512kpts_dils_0_0_31 --output_stride 4 --nr_keypoints 512 --des APD2Net --dilations 31
```

After extracting features, they can be evaluated by running hpatches_sequences/HPatches-Sequences-Matching-Benchmark.ipynb (add the methods that you want to evaluate)

### Estimate multiply-adds

Regular downsampling:
```bash
python eval_flops.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --output_stride [1,2,4,8] --nr_keypoints 512
```

Adaptive downsampling (example for dilations 25 51 51):
```bash
python eval_flops.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --output_stride 1 --nr_keypoints 512 --des APD2Net --dilations 25 51 51
```

Adaptive downsampling (example for dilations 0 0 31):
```bash
python eval_flops.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --output_stride 4 --nr_keypoints 512 --des APD2Net --dilations 0 0 31
```

## Acknowledgments

We would like to thank the contributors of the following repositories for using parts of their publicly available code:
- https://github.com/VainF/DeepLabV3Plus-Pytorch
- https://github.com/mihaidusmanu/d2-net

## Citation
If you find our work helpful please consider citing
```
@inproceedings{Hesse:2023:CAD,
  title     = {Content-Adaptive Downsampling in Convolutional Neural Networks},
  author    = {Hesse, Robin and Schaub-Meyer, Simone and Roth, Stefan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), The 6$^\text{th}$ Efficient Deep Learning for Computer Vision (ECV) Workshop},
  year      = {2023}
}
```