# LexSubGen **Repository Path**: mirrors_Samsung/LexSubGen ## Basic Information - **Project Name**: LexSubGen - **Description**: Lexical Substitution Framework - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-11-10 - **Last Updated**: 2026-05-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # LexSubGen Lexical Substitution Framework This repository contains the code to reproduce the results from the paper: Arefyev Nikolay, Sheludko Boris, Podolskiy Alexander, Panchenko Alexander, ["Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution"](https://www.aclweb.org/anthology/2020.coling-main.107/), Proceedings of the 28th International Conference on Computational Linguistics, 2020 ## Installation Clone LexSubGen repository from github.com. ```shell script git clone https://github.com/Samsung/LexSubGen cd LexSubGen ``` ### Setup anaconda environment 1. Download and install [conda](https://conda.io/docs/user-guide/install/download.html) 2. Create new conda environment ```shell script conda create -n lexsubgen python=3.7.4 ``` 3. Activate conda environment ```shell script conda activate lexsubgen ``` 4. Install requirements ```shell script pip install -r requirements.txt ``` 5. Download spacy resources and install context2vec and word_forms from github repositories ```shell script ./init.sh ``` ### Setup Web Application **If you do not plan to use the Web Application, skip this section and [go to the next](#13-install-lexsubgen-library)!** 1. Download and install [NodeJS and npm](https://www.npmjs.com/get-npm). 2. Run script for install dependencies and create build files. ```shell script bash web_app_setup.sh ``` ### Install lexsubgen library ```shell script python setup.py install ``` ## Results Results of the lexical substitution task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.
Model SemEval COINCO
GAP P@1 P@3 R@10 GAP P@1 P@3 R@10
OOC 44.65 16.82 12.83 18.36 46.3 19.58 15.03 12.99
C2V 55.82 7.79 5.92 11.03 48.32 8.01 6.63 7.54
C2V+embs 53.39 28.01 21.72 33.52 50.73 29.64 24.0 21.97
ELMo 53.66 11.58 8.55 13.88 49.47 13.58 10.86 11.35
ELMo+embs 54.16 32.0 22.2 31.82 52.22 35.96 26.62 23.8
BERT 54.42 38.39 27.73 39.57 50.5 42.56 32.64 28.73
BERT+embs 53.87 41.64 30.59 43.88 50.85 46.05 35.63 31.67
RoBERTa 56.74 32.25 24.26 36.65 50.82 35.12 27.35 25.41
RoBERTa+embs 58.74 43.19 31.19 44.61 54.6 46.54 36.17 32.1
XLNet 59.12 31.75 22.83 34.95 53.39 38.16 28.58 26.47
XLNet+embs 59.62 49.53 34.9 47.51 55.63 51.5 39.92 35.12
### Results reproduction Here we list XLNet reproduction commands that correspond to the results presented in the table above. Reproduction commands for all models you can find in ```scripts/lexsub-all-models.sh``` Besides saving to the 'run-directory' all results are saved using mlflow. To check them you can run ```mlflow ui``` in LexSubGen directory and then open the web page in a browser. Also you can use pytest to check the reproducibility. But it may take a long time: ```shell script pytest tests/results_reproduction ``` * #### XLNet: XLNet Semeval07: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet' ``` XLNet CoInCo: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet' ``` XLNet with embeddings similarity Semeval07: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet_embs' ``` XLNet with embeddings similarity CoInCo: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet_embs' ``` ## Word Sense Induction Results
Model SemEval 2013 SemEval 2010
AVG AVG
XLNet 33.4 52.1
XLNet+embs 37.3 54.1
To reproduce these results use 2.3.0 version of transformers and the following command: ```shell script bash scripts/wsi.sh ``` ### Web application You could use command line interface to run Web application. ```shell script # Run main server lexsubgen-app run --host HOST --port PORT [--model-configs CONFIGS] [--start-ids START-IDS] [--start-all] [--restore-session] ``` **Example:** ```shell script # Run server and serve models BERT and XLNet. # For BERT create server for serving model and substitute generator instantly (load resources in memory). # For XLNet create only server. lexsubgen-app run --host '0.0.0.0' --port 5000 --model-configs '["my_cool_configs/bert.jsonnet", "my_awesome_configs/xlnet.jsonnet"]' --start-ids '[0]' # After shutting down server JSON file with session dumps in the '~/.cache/lexsubgen/app_session.json'. # The content of this file looks like: # [ # 'my_cool_configs/bert.jsonnet', # 'my_awesome_configs/xlnet.jsonnet', # ] # You can restore it with flag 'restore-session' lexsubgen-app run --host '0.0.0.0' --port 5000 --restore-session # BERT and XLNet restored now ``` ##### Arguments: |Argument |Default|Description | |-------------------|-------|----------------------------------------------------------------------------------------------| |`--help` | |Show this help message and exit | |`--host` | |IP address of running server host | |`--port` |`5000` |Port for starting the server | |`--model-configs` |`[]` |List of file paths to the model configs. | |`--start-ids` |`[]` |Zero-based indices of served models for which substitute generators will be created | |`--start-all` |`False`|Whether to create substitute generators for all served models | |`--restore-session`|`False`|Whether to restore session from previous Web application run | ### FAQ 1. How to use gpu? - You can use environment variable CUDA_VISIBLE_DEVICES to use gpu for inference: ```export CUDA_VISIBLE_DEVICES='1'``` or ```CUDA_VISIBLE_DEVICES='1'``` before your command. 1. How to run tests? - You can use pytest: ```pytest tests```