# dlcl **Repository Path**: ihyc/dlcl ## Basic Information - **Project Name**: dlcl - **Description**: No description available - **Primary Language**: Python - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-08-23 - **Last Updated**: 2024-06-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Learning Deep Transformer Models for Machine Translation on Fairseq The implementation of [Learning Deep Transformer Models for Machine Translation [ACL 2019] ](https://arxiv.org/abs/1906.01787) (**Qiang Wang**, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao) > This code is based on [Fairseq v0.5.0](https://github.com/pytorch/fairseq/tree/v0.5.0) ## Installation 1. `pip install -r requirements.txt` 2. `python setup.py develop` 3. `python setup.py install` NOTE: test in `torch==0.4.1` ## Prepare Training Data 1. Download the preprocessed [WMT'16 En-De dataset](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) provided by Google to project root dir 2. Generate binary dataset at `data-bin/wmt16_en_de_google` > `bash runs/prepare-wmt-en2de.sh` ## Train ### Train deep pre-norm baseline (20-layer encoder) > `bash runs/train-wmt-en2de-deep-prenorm-baseline.sh` ### Train deep post-norm DLCL (25-layer encoder) > `bash runs/train-wmt-en2de-deep-postnorm-dlcl.sh` ### Train deep pre-norm DLCL (30-layer encoder) > `bash runs/train-wmt-en2de-deep-prenorm-dlcl.sh` NOTE: BLEU will be calculated automatically when finishing training ## Results Model | #Param. |Epoch* | BLEU :--|:--:|:--:|:--:| [Transformer](https://arxiv.org/abs/1706.03762) (base) | 65M | 20 | 27.3 [Transparent Attention](https://arxiv.org/abs/1808.07561) (base, `16L`) | 137M | - | 28.0 [Transformer](https://arxiv.org/abs/1706.03762) (big) | 213M | 60 | 28.4 [RNMT+](https://arxiv.org/abs/1804.09849) (big) | 379M | 25 | 28.5 [Layer-wise Coordination](https://papers.nips.cc/paper/8019-layer-wise-coordination-between-encoder-and-decoder-for-neural-machine-translation.pdf) (big) | 210M* | - | 29.0 [Relative Position Representations](https://arxiv.org/abs/1803.02155) (big) | 210M | 60 | 29.2 [Deep Representation](https://arxiv.org/abs/1810.10181) (big) | 356M | - | 29.2 [Scailing NMT](https://arxiv.org/abs/1806.00187) (big) | 210M | 70 | 29.3 Our deep pre-norm Transformer (base, `20L`) | 106M | 20 | 28.9 Our deep post-norm DLCL (base, `25L`) | 121M | 20 | 29.2 Our deep pre-norm DLCL (base, `30L`) | 137M | 20 | 29.3 NOTE: `*` denotes approximate values.