# fairseq **Repository Path**: hello20/fairseq ## Basic Information - **Project Name**: fairseq - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-05-26 - **Last Updated**: 2021-11-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Installation ## Step 1: create conda environment ```bash conda create -n fairseq python=3.7 conda activate fairseq conda install pytorch==1.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge ``` ## Step 2: install apex for faster training ```bash scl enable devtoolset-7 -- bash git clone https://github.com/NVIDIA/apex.git cd apex pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \ --global-option="--deprecated_fused_adam" --global-option="--xentropy" \ --global-option="--fast_multihead_attn" ./ ``` Note that the first command requires devtoolset, if not installed, run the command below with **root**: ```bash sudo yum install devtoolset-7 ``` ## Step 3: install fairseq ```bash cd fairseq pip install -e . ``` # Training of Models ## RPR Attention with Cutoff ```bash CUDA_VISIBLE_DEVICES=0 nohup fairseq-train data-bin/iwslt14.tokenized.de-en \ --arch transformer_iwslt_de_en --share-all-embeddings --max-relative-length 8 --k-only \ --encoder-normalize-before --decoder-normalize-before \ --augmentation --augmentation-schema cut_off --augmentation-masking-schema word \ --augmentation-masking-probability 0.05 --augmentation-replacing-schema mask \ --max-tokens 4096 --max-epoch 15 --num-workers 2 --keep-last-epochs 5 \ --optimizer adam --adam-betas "(0.9, 0.98)" --clip-norm 0.0 \ --lr 0.0015 --lr-scheduler inverse_sqrt --stop-min-lr 1e-9 \ --warmup-updates 8000 --warmup-init-lr 1e-7 --weight-decay 0.0001 \ --dropout 0.3 --attention-dropout 0.1 --activation-dropout 0.1 \ --criterion label_smoothed_cross_entropy_with_regularization --label-smoothing 0.1 \ --save-dir checkpoints/rpr-augmented \ --log-interval -1 --fp16 > logs/rpr-augmented.log 2>&1 & ``` ## RPR Attention ```bash CUDA_VISIBLE_DEVICES=0 nohup fairseq-train data-bin/iwslt14.tokenized.de-en \ --arch transformer_iwslt_de_en --share-all-embeddings --max-relative-length 8 --k-only \ --encoder-normalize-before --decoder-normalize-before \ --max-tokens 4096 --max-epoch 5 --num-workers 0 --keep-last-epochs 5 \ --optimizer adam --adam-betas "(0.9, 0.98)" --clip-norm 0.0 \ --lr 0.0015 --lr-scheduler inverse_sqrt --stop-min-lr 1e-9 \ --warmup-updates 8000 --warmup-init-lr 1e-7 --weight-decay 0.0001 \ --dropout 0.3 --attention-dropout 0.1 --activation-dropout 0.1 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --save-dir checkpoints/rpr \ --log-interval 100 --fp16 > logs/rpr.log 2>&1 & tail -f logs/rpr.log ``` ## RPR Attention + DLCL ```bash CUDA_VISIBLE_DEVICES=1 nohup fairseq-train data-bin/iwslt14.tokenized.de-en \ --arch transformer_iwslt_de_en --share-all-embeddings --max-relative-length 8 --k-only \ --enc-dlcl --dec-dlcl \ --encoder-normalize-before --decoder-normalize-before \ --max-tokens 4096 --max-epoch 10 --num-workers 0 --keep-last-epochs 10 \ --optimizer adam --adam-betas "(0.9, 0.98)" --clip-norm 0.0 \ --lr 0.0015 --lr-scheduler inverse_sqrt --stop-min-lr 1e-9 \ --warmup-updates 8000 --warmup-init-lr 1e-7 --weight-decay 0.0001 \ --dropout 0.3 --attention-dropout 0.1 --activation-dropout 0.1 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --save-dir checkpoints/dlcl \ --log-interval 100 --fp16 > logs/dlcl.log 2>&1 & tail -f logs/dlcl.log ``` nohup ./bin/NiuTrans.NMT -train train.data -valid valid.data -shareencdec 1 -sharedec 1 -model model.dlcl -encprenorm 1 -decprenorm 1 -maxrp 8 -enchistory 1 -dechistory 1 -nepoch 10 -dev 3 > dlcl.log 2>&1 &