# semantic-segmentation-pytorch
**Repository Path**: kongmo/semantic-segmentation-pytorch
## Basic Information
- **Project Name**: semantic-segmentation-pytorch
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2018-07-27
- **Last Updated**: 2020-12-17
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Semantic Segmentation on MIT ADE20K dataset in PyTorch
This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset.
ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7:
https://github.com/CSAILVision/sceneparsing
Pretrained models can be found at:
http://sceneparsing.csail.mit.edu/model/
[From left to right: Test Image, Ground Truth, Predicted Result]
## Highlights [NEW!]
### Syncronized Batch Normalization on PyTorch
This module differs from the built-in PyTorch BatchNorm as the mean and standard-deviation are reduced across all devices during training. The importance of synchronized batch normalization in object detection has been recently proved with a an extensive analysis in the paper [MegDet: A Large Mini-Batch Object Detector](https://arxiv.org/abs/1711.07240), and we empirically find that it is also important for segmentation.
The implementation is reasonable due to the following reasons:
- This implementation is in pure-python. No C++ extra extension libs.
- Easy to use.
- It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).
***To the best knowledge, it is the first pure-python implementation of sync bn on PyTorch, and also the first one completely compatible with PyTorch. It is also efficient, only 20% to 30% slower than un-sync bn.*** We especially thank [Jiayuan Mao](http://vccy.xyz/) for his kind contributions. For more details about the implementation and usage, refer to [Synchronized-BatchNorm-PyTorch](https://github.com/vacancy/Synchronized-BatchNorm-PyTorch).
### Dynamic scales of input for training with multiple GPUs
Different from image classification task, where the input images are resized to a fixed scale such as 224x224, it is better to keep original aspect ratios of input images for semantic segmentation and object detection networks.
So we re-implement the `DataParallel` module, and make it support distributing data to multiple GPUs in python dict. At the same time, the dataloader also operates differently. *Now the batch size of a dataloader always equals to the number of GPUs*, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for `__getitem__` function, we just ignore such request and send a random batch dict. Also, *the multiple workers forked by the dataloader all have the same seed*, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for `numpy.random` before activating multiple worker in dataloader.
### An Efficient and Effective Framework: UPerNet
UPerNet based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM), with down-sampling rate of 4, 8 and 16. It doesn't need dilated convolution, a operator that is time-and-memory consuming. *Without bells and whistles*, it is comparable or even better compared with PSPNet, while requires much shorter training time and less GPU memory. E.g., you cannot train a PSPNet-101 on TITAN Xp GPUs with only 12GB memory, while you can train a UPerNet-101 on such GPUs.
Thanks to the efficient network design, we will soon opensource stronger models of UPerNet based on ResNeXt that is able to run on normal GPUs.
## Supported models
We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling.
Encoder: (resnetXX_dilatedYY: customized resnetXX with dilated convolutions, output feature map is 1/YY of input size.)
- ResNet50: resnet50_dilated16, resnet50_dilated8
- ResNet101: resnet101_dilated16, resnet101_dilated8
***Coming soon***:
- ResNeXt101: resnext101_dilated16, resnext101_dilated8
Decoder:
- c1_bilinear (1 conv + bilinear upsample)
- c1_bilinear_deepsup (c1_blinear + deep supervision trick)
- ppm_bilinear (pyramid pooling + bilinear upsample, see [PSPNet](https://hszhao.github.io/projects/pspnet) paper for details)
- ppm_bilinear_deepsup (ppm_bilinear + deep supervision trick)
- upernet (pyramid pooling + FPN head)
## Performance:
IMPORTANT: We use our self-trained base model on ImageNet. The model takes the input in BGR form (consistent with opencv) instead of RGB form as used by default implementation of PyTorch. The base model will be automatically downloaded when needed.
| Architecture | MS Test | Mean IoU | Pixel Accuracy | Overall Score | Training Time |
|---|---|---|---|---|---|
| ResNet-50_dilated8 + c1_bilinear_deepsup | No | 34.88 | 76.54 | 55.71 | 1.38 * 20 = 27.6 hours |
| ResNet-50_dilated8 + ppm_bilinear_deepsup | No | 41.26 | 79.73 | 60.50 | 1.67 * 20 = 33.4 hours |
| Yes | 42.04 | 80.23 | 61.14 | ||
| ResNet-101_dilated8 + ppm_bilinear_deepsup | No | 42.19 | 80.59 | 61.39 | 3.82 * 25 = 95.5 hours |
| Yes | 42.53 | 80.91 | 61.72 | ||
| UperNet-50 | No | 40.44 | 79.80 | 60.12 | 1.75 * 20 = 35.0 hours |
| Yes | 41.55 | 80.23 | 60.89 | ||
| UperNet-101 | No | 41.98 | 80.63 | 61.34 | 2.5 * 25 = 62.5 hours |
| Yes | 42.66 | 81.01 | 61.84 | ||
| UPerNet-ResNext101 (coming soon!) | - | - | - | - | - hours |