# acl_ocrnet **Repository Path**: yukming_law/acl_ocrnet ## Basic Information - **Project Name**: acl_ocrnet - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2021-03-19 - **Last Updated**: 2023-12-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # OCRNet Demo 论文地址:https://arxiv.org/pdf/1909.11065.pdf 官方源码:https://github.com/HRNet/HRNet-Semantic-Segmentation/tree/HRNet-OCR ### 0.下载源代码 ```buildoutcfg git clone https://github.com/HRNet/HRNet-Semantic-Segmentation.git cd ./HRNet-Semantic-Segmentation git checkout HRNet-OCR ``` 依赖 pytorch-cpu 1.6 onnx ### 1.修改源代码 #### 修改 lib/seg_hrnet_ocr.py 源码中作者使用自己开发的BatchNorm2d算子,需将其改为pytorch自带的BatchNorm2d算子 ```buildoutcfg step1: #line22 #from .bn_helper import BatchNorm2d, BatchNorm2d_class, relu_inplace step2: # 将代码中全部BatchNorm2d改为nn.BatchNorm2d step3: #line658 # elif isinstance(m, BatchNorm2d_class): elif isinstance(m, nn.BatchNorm2d): step4: # 将relu_inplace参数去除 # nn.ReLU(inplace=relu_inplace) --> nn.ReLU() ``` pytorch中的upsample算子在转换onnx时会转换成resize算子,由于当前版本的onnx算子库当中的resize算子不支持size参数, 只支持scale参数,所以按照算法逻辑将原本直接指定upsample大小改为scale值缩放。 ```buildoutcfg #line243 y = y + F.interpolate( self.fuse_layers[i][j](x[j]), #size=[height_output, width_output], scale_factor=x[i].shape[-1] // x[j].shape[-2], mode='bilinear') #line 457 #x1 = F.interpolate(x[1], size=(x0_h, x0_w), mode='bilinear', align_corners=ALIGN_CORNERS) ##x2 = F.interpolate(x[2], size=(x0_h, x0_w), mode='bilinear', align_corners=ALIGN_CORNERS) #x3 = F.interpolate(x[3], size=(x0_h, x0_w), mode='bilinear', align_corners=ALIGN_CORNERS) x1 = F.interpolate(x[1], scale_factor=2, mode='bilinear', align_corners=ALIGN_CORNERS) x2 = F.interpolate(x[2], scale_factor=4, mode='bilinear', align_corners=ALIGN_CORNERS) x3 = F.interpolate(x[3], scale_factor=8, mode='bilinear', align_corners=ALIGN_CORNERS) ``` 在OCR模块中存在MatMul算子,输入数据形状为(N,D1,D2),由于目前ATC转换工具适配onnx模型中的MatMul算子只支持2D数据。故需要对源码中MatMul算子的输入reshape成2D数据,这里就产生了一个约束:模型BatchSize固定为1。若模型BatchSize为N(N>1),则该处需要将数据用Slice算子按axis0切分,再按照2D数据输入到N个MatMul算子,否则ATC转换报错。 ```buildoutcfg # line59 def forward(self, feats, probs): batch_size, c, h, w = probs.size(0), probs.size(1), probs.size(2), probs.size(3) probs = probs.view(batch_size, c, -1) feats = feats.view(batch_size, feats.size(1), -1) feats = feats.permute(0, 2, 1) # batch x hw x c probs = F.softmax(self.scale * probs, dim=2)# batch x k x hw probs = probs.view(probs.size(1), probs.size(2)) # reshape to 2D data feats = feats.view(feats.size(1), feats.size(2)) # reshape to 2D data ocr_context = torch.matmul(probs, feats) # reshape to 3D data ocr_context = ocr_context.view(batch_size, ocr_context.size(0), ocr_context.size(1))\ .permute(0, 2, 1).unsqueeze(3)# batch x k x c return ocr_context # line123 def forward(self, x, proxy): batch_size, h, w = x.size(0), x.size(2), x.size(3) if self.scale > 1: x = self.pool(x) query = self.f_pixel(x).view(batch_size, self.key_channels, -1) query = query.permute(0, 2, 1) key = self.f_object(proxy).view(batch_size, self.key_channels, -1) value = self.f_down(proxy).view(batch_size, self.key_channels, -1) value = value.permute(0, 2, 1) query = query.view(query.size(1), query.size(2)) # reshape to 2D data key = key.view(key.size(1),key.size(2)) # reshape to 2D data sim_map = torch.matmul(query, key) sim_map = (self.key_channels**-.5) * sim_map sim_map = F.softmax(sim_map, dim=-1) # add bg context ... value = value.view(value.size(1), value.size(2)) # reshape to 2D data context = torch.matmul(sim_map, value) context = context.view(batch_size, context.size(0), context.size(1)) # reshape to 3D data context = context.permute(0, 2, 1).contiguous() context = context.view(batch_size, self.key_channels, *x.size()[2:]) context = self.f_up(context) if self.scale > 1: context = F.interpolate(input=context, size=(h, w), mode='bilinear', align_corners=ALIGN_CORNERS) return context ``` 去除加载预训练模型语句 ```buildoutcfg def get_seg_model(cfg, **kwargs): model = HighResolutionNet(cfg, **kwargs) #model.init_weights(cfg.MODEL.PRETRAINED) ``` ### 2.导出onnx模型 #### 修改 tools/test.py 由于使用的是cpu版本的pytorch,需要将test.py中有关gpu的语句去除 ```buildoutcfg # import torch.backends.cudnn as cudnn # cudnn.benchmark = config.CUDNN.BENCHMARK # cudnn.deterministic = config.CUDNN.DETERMINISTIC # cudnn.enabled = config.CUDNN.ENABLED #logger.info(get_model_summary(model.cuda(), dump_input.cuda())) ``` 导入模型文件时,将其存储在CPU侧 ```buildoutcfg #pretrained_dict = torch.load(model_state_file) pretrained_dict = torch.load(model_state_file, map_location=torch.device('cpu')) ``` 在模型导入完毕后,将其导出为onnx模型 ```buildoutcfg model_dict.update(pretrained_dict) model.load_state_dict(model_dict) x = torch.randn(1,3,480,480).cpu() # 此处先用480x480大小,用其他大小可能会因为上述的修改而报错 torch.onnx.export(model,x,'OCRNet.onnx',verbose=True,opset_version=11, input_names=["input"], output_names=["aux_output", "ocr_output"]) import onnx onnx.save(onnx.shape_inference.infer_shapes(onnx.load('OCRNet.onnx')), 'OCRNet.onnx') exit() ``` 下载hrnet_ocr_pascal_ctx_5618_torch11.pth ```buildoutcfg wget https://github.com/hsfzxjy/models.storage/releases/download/HRNet-OCR/hrnet_ocr_pascal_ctx_5618_torch11.pth ``` 运行tools/test.py ```buildoutcfg python tools/test.py --cfg experiments/pascal_ctx/seg_hrnet_ocr_w48_cls59_520x520_sgd_lr1e-3_wd1e-4_bs_16_epoch200.yaml \ DATASET.TEST_SET testval \ TEST.MODEL_FILE hrnet_ocr_pascal_ctx_5618_torch11.pth \ TEST.SCALE_LIST 0.5,0.75,1.0,1.25,1.5,1.75,2.0 \ TEST.FLIP_TEST True ``` ### 3.转换om模型 A.图片预处理 按照源代码中的预处理逻辑 ```buildoutcfg # lib/dataset/pascal_ctx.py:PASCALContext.__init__ def __init__(self, root, list_path, num_samples=None, num_classes=59, multi_scale=True, flip=True, ignore_label=-1, base_size=520, crop_size=(480, 480), downsample_rate=1, scale_factor=16, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],): # lib/dataset/base_dataset.py:BaseDataset.input_transform def input_transform(self, image): image = image.astype(np.float32)[:, :, ::-1] image = image / 255.0 image -= self.mean image /= self.std return image ``` 可得以下等式 ```buildoutcfg [(value / 255.0) - mean ] / std == (value - mean * 255) * [1 / (std * 255)] ``` 故AIPP配置文件中设置为 ```buildoutcfg min_chn_0 : 123.68 min_chn_1 : 116.779 min_chn_2 : 103.939 var_reci_chn_0 : 0.0172 var_reci_chn_1 : 0.0172 var_reci_chn_2 : 0.0172 ``` B.若使用Ascend310中内置的DVPP-JPGD图片解码模块,则模型的输入需为YUV格式,此处使用AIPP中的色域转换功能将YUV转成RGB ```buildoutcfg export ASCEND_SLOG_PRINT_TO_STDOUT=1 atc --model=OCRNet.onnx --framework=5 --output=OCRNet --soc_version=Ascend310 --input_shape="input:1,3,xxx,xxx" --insert_op_conf=aipp_yuv.cfg --log=error ``` C.若使用OpenCV作为图片解码模块,则模型的输入为RGB格式,此处只使用AIPP中的减均值乘方差系数即可 ```buildoutcfg export ASCEND_SLOG_PRINT_TO_STDOUT=1 atc --model=OCRNet.onnx --framework=5 --output=OCRNet --soc_version=Ascend310 --input_shape="input:1,3,xxx,xxx" --insert_op_conf=aipp_rgb.cfg --log=error ``` D.模型优化(可选) ATC工具可使用 ``--auto_tune_mode="GA,RL"`` 参数开启GA,RL优化。 详情请参阅:https://support.huawei.com/enterprise/zh/doc/EDOC1100180777/53b7d89a ### 4.运行Demo 配置环境变量 ```buildoutcfg export ACL_PATH=/usr/local/Ascend/ascend-toolkit/latest export ACL_NNRT=/usr/local/Ascend/nnrt/latest export PYTHONPATH=${ACL_PATH}/pyACL/python/site-packages/acl:$PYTHONPATH export LD_LIBRARY_PATH=${ACL_PATH}/acllib/lib64:/usr/local/Ascend/driver/lib64:$LD_LIBRARY_PATH ``` 在acl_demo.py中定义使用的图片解码模块 ```buildoutcfg # decoder = 'opencv' decoder = 'dvpp' ``` 运行acl_demo.py ```buildoutcfg python3.7.5 acl_demo.py ```