# acl_ocrnet

**Repository Path**: yukming_law/acl_ocrnet

## Basic Information

- **Project Name**: acl_ocrnet
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2021-03-19
- **Last Updated**: 2023-12-07

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# OCRNet Demo

论文地址：https://arxiv.org/pdf/1909.11065.pdf

官方源码：https://github.com/HRNet/HRNet-Semantic-Segmentation/tree/HRNet-OCR
### 0.下载源代码
```buildoutcfg
git clone https://github.com/HRNet/HRNet-Semantic-Segmentation.git
cd ./HRNet-Semantic-Segmentation
git checkout HRNet-OCR
```
依赖

pytorch-cpu 1.6

onnx
### 1.修改源代码

#### 修改 lib/seg_hrnet_ocr.py
源码中作者使用自己开发的BatchNorm2d算子，需将其改为pytorch自带的BatchNorm2d算子
```buildoutcfg
step1:
#line22
#from .bn_helper import BatchNorm2d, BatchNorm2d_class, relu_inplace
step2:
# 将代码中全部BatchNorm2d改为nn.BatchNorm2d
step3:
#line658
# elif isinstance(m, BatchNorm2d_class):
elif isinstance(m, nn.BatchNorm2d):
step4:
# 将relu_inplace参数去除
# nn.ReLU(inplace=relu_inplace) --> nn.ReLU()
```
pytorch中的upsample算子在转换onnx时会转换成resize算子，由于当前版本的onnx算子库当中的resize算子不支持size参数，
只支持scale参数，所以按照算法逻辑将原本直接指定upsample大小改为scale值缩放。
```buildoutcfg
#line243
y = y + F.interpolate(
    self.fuse_layers[i][j](x[j]),
    #size=[height_output, width_output],
    scale_factor=x[i].shape[-1] // x[j].shape[-2],
    mode='bilinear')
#line 457
#x1 = F.interpolate(x[1], size=(x0_h, x0_w), mode='bilinear', align_corners=ALIGN_CORNERS)
##x2 = F.interpolate(x[2], size=(x0_h, x0_w), mode='bilinear', align_corners=ALIGN_CORNERS)
#x3 = F.interpolate(x[3], size=(x0_h, x0_w), mode='bilinear', align_corners=ALIGN_CORNERS)
x1 = F.interpolate(x[1], scale_factor=2, mode='bilinear', align_corners=ALIGN_CORNERS)
x2 = F.interpolate(x[2], scale_factor=4, mode='bilinear', align_corners=ALIGN_CORNERS)
x3 = F.interpolate(x[3], scale_factor=8, mode='bilinear', align_corners=ALIGN_CORNERS)

```
在OCR模块中存在MatMul算子，输入数据形状为(N,D1,D2),由于目前ATC转换工具适配onnx模型中的MatMul算子只支持2D数据。故需要对源码中MatMul算子的输入reshape成2D数据,这里就产生了一个约束：模型BatchSize固定为1。若模型BatchSize为N(N>1),则该处需要将数据用Slice算子按axis0切分，再按照2D数据输入到N个MatMul算子，否则ATC转换报错。
```buildoutcfg
# line59
def forward(self, feats, probs):
    batch_size, c, h, w = probs.size(0), probs.size(1), probs.size(2), probs.size(3)
    probs = probs.view(batch_size, c, -1)
    feats = feats.view(batch_size, feats.size(1), -1)
    feats = feats.permute(0, 2, 1) # batch x hw x c 
    probs = F.softmax(self.scale * probs, dim=2)# batch x k x hw
    probs = probs.view(probs.size(1), probs.size(2)) # reshape to 2D data
    feats = feats.view(feats.size(1), feats.size(2)) # reshape to 2D data
    ocr_context = torch.matmul(probs, feats)
    # reshape to 3D data
    ocr_context = ocr_context.view(batch_size, ocr_context.size(0), ocr_context.size(1))\
    .permute(0, 2, 1).unsqueeze(3)# batch x k x c
    return ocr_context
# line123
def forward(self, x, proxy):
    batch_size, h, w = x.size(0), x.size(2), x.size(3)
    if self.scale > 1:
        x = self.pool(x)
    query = self.f_pixel(x).view(batch_size, self.key_channels, -1)
    query = query.permute(0, 2, 1)
    key = self.f_object(proxy).view(batch_size, self.key_channels, -1)
    value = self.f_down(proxy).view(batch_size, self.key_channels, -1)
    value = value.permute(0, 2, 1)
        
    query = query.view(query.size(1), query.size(2)) # reshape to 2D data
    key = key.view(key.size(1),key.size(2)) # reshape to 2D data
    sim_map = torch.matmul(query, key)
    sim_map = (self.key_channels**-.5) * sim_map
    sim_map = F.softmax(sim_map, dim=-1)   

    # add bg context ...
    value = value.view(value.size(1), value.size(2)) # reshape to 2D data
    context = torch.matmul(sim_map, value)
    context = context.view(batch_size, context.size(0), context.size(1)) # reshape to 3D data
    context = context.permute(0, 2, 1).contiguous()
    context = context.view(batch_size, self.key_channels, *x.size()[2:])
    context = self.f_up(context)
    if self.scale > 1:
        context = F.interpolate(input=context, size=(h, w), mode='bilinear', align_corners=ALIGN_CORNERS)
    return context
```
去除加载预训练模型语句
```buildoutcfg
 def get_seg_model(cfg, **kwargs):
     model = HighResolutionNet(cfg, **kwargs)
     #model.init_weights(cfg.MODEL.PRETRAINED)
```
### 2.导出onnx模型
#### 修改 tools/test.py
由于使用的是cpu版本的pytorch，需要将test.py中有关gpu的语句去除
```buildoutcfg
# import torch.backends.cudnn as cudnn

# cudnn.benchmark = config.CUDNN.BENCHMARK
# cudnn.deterministic = config.CUDNN.DETERMINISTIC
# cudnn.enabled = config.CUDNN.ENABLED

#logger.info(get_model_summary(model.cuda(), dump_input.cuda()))
```
导入模型文件时，将其存储在CPU侧
```buildoutcfg
#pretrained_dict = torch.load(model_state_file)
pretrained_dict = torch.load(model_state_file, map_location=torch.device('cpu'))
```
在模型导入完毕后，将其导出为onnx模型
```buildoutcfg
model_dict.update(pretrained_dict)
model.load_state_dict(model_dict)

x = torch.randn(1,3,480,480).cpu() # 此处先用480x480大小，用其他大小可能会因为上述的修改而报错
torch.onnx.export(model,x,'OCRNet.onnx',verbose=True,opset_version=11, input_names=["input"], output_names=["aux_output", "ocr_output"])
import onnx
onnx.save(onnx.shape_inference.infer_shapes(onnx.load('OCRNet.onnx')), 'OCRNet.onnx')
exit()
```
下载hrnet_ocr_pascal_ctx_5618_torch11.pth
```buildoutcfg
wget https://github.com/hsfzxjy/models.storage/releases/download/HRNet-OCR/hrnet_ocr_pascal_ctx_5618_torch11.pth
```
运行tools/test.py
```buildoutcfg
python tools/test.py --cfg experiments/pascal_ctx/seg_hrnet_ocr_w48_cls59_520x520_sgd_lr1e-3_wd1e-4_bs_16_epoch200.yaml \
                     DATASET.TEST_SET testval \
                     TEST.MODEL_FILE hrnet_ocr_pascal_ctx_5618_torch11.pth \
                     TEST.SCALE_LIST 0.5,0.75,1.0,1.25,1.5,1.75,2.0 \
                     TEST.FLIP_TEST True
```
### 3.转换om模型
A.图片预处理

按照源代码中的预处理逻辑
```buildoutcfg
# lib/dataset/pascal_ctx.py:PASCALContext.__init__
def __init__(self, 
                 root, 
                 list_path, 
                 num_samples=None, 
                 num_classes=59,
                 multi_scale=True, 
                 flip=True, 
                 ignore_label=-1, 
                 base_size=520, 
                 crop_size=(480, 480), 
                 downsample_rate=1,
                 scale_factor=16,
                 mean=[0.485, 0.456, 0.406], 
                 std=[0.229, 0.224, 0.225],):
# lib/dataset/base_dataset.py:BaseDataset.input_transform
def input_transform(self, image):
        image = image.astype(np.float32)[:, :, ::-1]
        image = image / 255.0
        image -= self.mean
        image /= self.std
        return image
```
可得以下等式
```buildoutcfg
[(value / 255.0) - mean ] / std == (value - mean * 255) * [1 / (std * 255)]
```
故AIPP配置文件中设置为
```buildoutcfg
min_chn_0 : 123.68
min_chn_1 : 116.779
min_chn_2 : 103.939
var_reci_chn_0 : 0.0172
var_reci_chn_1 : 0.0172
var_reci_chn_2 : 0.0172
```
B.若使用Ascend310中内置的DVPP-JPGD图片解码模块，则模型的输入需为YUV格式，此处使用AIPP中的色域转换功能将YUV转成RGB
```buildoutcfg
export ASCEND_SLOG_PRINT_TO_STDOUT=1
atc --model=OCRNet.onnx --framework=5 --output=OCRNet --soc_version=Ascend310 --input_shape="input:1,3,xxx,xxx" --insert_op_conf=aipp_yuv.cfg --log=error
```
C.若使用OpenCV作为图片解码模块，则模型的输入为RGB格式，此处只使用AIPP中的减均值乘方差系数即可
```buildoutcfg
export ASCEND_SLOG_PRINT_TO_STDOUT=1
atc --model=OCRNet.onnx --framework=5 --output=OCRNet --soc_version=Ascend310 --input_shape="input:1,3,xxx,xxx" --insert_op_conf=aipp_rgb.cfg --log=error
```
D.模型优化（可选）

ATC工具可使用 ``--auto_tune_mode="GA,RL"`` 参数开启GA，RL优化。

详情请参阅：https://support.huawei.com/enterprise/zh/doc/EDOC1100180777/53b7d89a
### 4.运行Demo
配置环境变量
```buildoutcfg
export ACL_PATH=/usr/local/Ascend/ascend-toolkit/latest
export ACL_NNRT=/usr/local/Ascend/nnrt/latest
export PYTHONPATH=${ACL_PATH}/pyACL/python/site-packages/acl:$PYTHONPATH
export LD_LIBRARY_PATH=${ACL_PATH}/acllib/lib64:/usr/local/Ascend/driver/lib64:$LD_LIBRARY_PATH
```
在acl_demo.py中定义使用的图片解码模块
```buildoutcfg
# decoder = 'opencv'
decoder = 'dvpp'
```
运行acl_demo.py
```buildoutcfg
python3.7.5 acl_demo.py
```