# LexSubGen **Repository Path**: mirrors_Samsung/LexSubGen ## Basic Information - **Project Name**: LexSubGen - **Description**: Lexical Substitution Framework - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-11-10 - **Last Updated**: 2026-05-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # LexSubGen Lexical Substitution Framework This repository contains the code to reproduce the results from the paper: Arefyev Nikolay, Sheludko Boris, Podolskiy Alexander, Panchenko Alexander, ["Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution"](https://www.aclweb.org/anthology/2020.coling-main.107/), Proceedings of the 28th International Conference on Computational Linguistics, 2020 ## Installation Clone LexSubGen repository from github.com. ```shell script git clone https://github.com/Samsung/LexSubGen cd LexSubGen ``` ### Setup anaconda environment 1. Download and install [conda](https://conda.io/docs/user-guide/install/download.html) 2. Create new conda environment ```shell script conda create -n lexsubgen python=3.7.4 ``` 3. Activate conda environment ```shell script conda activate lexsubgen ``` 4. Install requirements ```shell script pip install -r requirements.txt ``` 5. Download spacy resources and install context2vec and word_forms from github repositories ```shell script ./init.sh ``` ### Setup Web Application **If you do not plan to use the Web Application, skip this section and [go to the next](#13-install-lexsubgen-library)!** 1. Download and install [NodeJS and npm](https://www.npmjs.com/get-npm). 2. Run script for install dependencies and create build files. ```shell script bash web_app_setup.sh ``` ### Install lexsubgen library ```shell script python setup.py install ``` ## Results Results of the lexical substitution task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.

Model	SemEval				COINCO
Model	GAP	P@1	P@3	R@10	GAP	P@1	P@3	R@10
OOC	44.65	16.82	12.83	18.36	46.3	19.58	15.03	12.99
C2V	55.82	7.79	5.92	11.03	48.32	8.01	6.63	7.54
C2V+embs	53.39	28.01	21.72	33.52	50.73	29.64	24.0	21.97
ELMo	53.66	11.58	8.55	13.88	49.47	13.58	10.86	11.35
ELMo+embs	54.16	32.0	22.2	31.82	52.22	35.96	26.62	23.8
BERT	54.42	38.39	27.73	39.57	50.5	42.56	32.64	28.73
BERT+embs	53.87	41.64	30.59	43.88	50.85	46.05	35.63	31.67
RoBERTa	56.74	32.25	24.26	36.65	50.82	35.12	27.35	25.41
RoBERTa+embs	58.74	43.19	31.19	44.61	54.6	46.54	36.17	32.1
XLNet	59.12	31.75	22.83	34.95	53.39	38.16	28.58	26.47
XLNet+embs	59.62	49.53	34.9	47.51	55.63	51.5	39.92	35.12

### Results reproduction Here we list XLNet reproduction commands that correspond to the results presented in the table above. Reproduction commands for all models you can find in ```scripts/lexsub-all-models.sh``` Besides saving to the 'run-directory' all results are saved using mlflow. To check them you can run ```mlflow ui``` in LexSubGen directory and then open the web page in a browser. Also you can use pytest to check the reproducibility. But it may take a long time: ```shell script pytest tests/results_reproduction ``` * #### XLNet: XLNet Semeval07: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet' ``` XLNet CoInCo: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet' ``` XLNet with embeddings similarity Semeval07: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet_embs' ``` XLNet with embeddings similarity CoInCo: ```shell script python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet_embs' ``` ## Word Sense Induction Results

Model	SemEval 2013	SemEval 2010
Model	AVG	AVG
XLNet	33.4	52.1
XLNet+embs	37.3	54.1

To reproduce these results use 2.3.0 version of transformers and the following command: ```shell script bash scripts/wsi.sh ``` ### Web application You could use command line interface to run Web application. ```shell script # Run main server lexsubgen-app run --host HOST --port PORT [--model-configs CONFIGS] [--start-ids START-IDS] [--start-all] [--restore-session] ``` **Example:** ```shell script # Run server and serve models BERT and XLNet. # For BERT create server for serving model and substitute generator instantly (load resources in memory). # For XLNet create only server. lexsubgen-app run --host '0.0.0.0' --port 5000 --model-configs '["my_cool_configs/bert.jsonnet", "my_awesome_configs/xlnet.jsonnet"]' --start-ids '[0]' # After shutting down server JSON file with session dumps in the '~/.cache/lexsubgen/app_session.json'. # The content of this file looks like: # [ # 'my_cool_configs/bert.jsonnet', # 'my_awesome_configs/xlnet.jsonnet', # ] # You can restore it with flag 'restore-session' lexsubgen-app run --host '0.0.0.0' --port 5000 --restore-session # BERT and XLNet restored now ``` ##### Arguments: |Argument |Default|Description | |-------------------|-------|----------------------------------------------------------------------------------------------| |`--help` | |Show this help message and exit | |`--host` | |IP address of running server host | |`--port` |`5000` |Port for starting the server | |`--model-configs` |`[]` |List of file paths to the model configs. | |`--start-ids` |`[]` |Zero-based indices of served models for which substitute generators will be created | |`--start-all` |`False`|Whether to create substitute generators for all served models | |`--restore-session`|`False`|Whether to restore session from previous Web application run | ### FAQ 1. How to use gpu? - You can use environment variable CUDA_VISIBLE_DEVICES to use gpu for inference: ```export CUDA_VISIBLE_DEVICES='1'``` or ```CUDA_VISIBLE_DEVICES='1'``` before your command. 1. How to run tests? - You can use pytest: ```pytest tests```