# LexSubGen
**Repository Path**: mirrors_Samsung/LexSubGen
## Basic Information
- **Project Name**: LexSubGen
- **Description**: Lexical Substitution Framework
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-11-10
- **Last Updated**: 2026-05-17
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# LexSubGen
Lexical Substitution Framework
This repository contains the code to reproduce the results from the paper:
Arefyev Nikolay, Sheludko Boris, Podolskiy Alexander, Panchenko Alexander,
["Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution"](https://www.aclweb.org/anthology/2020.coling-main.107/),
Proceedings of the 28th International Conference on Computational Linguistics, 2020
## Installation
Clone LexSubGen repository from github.com.
```shell script
git clone https://github.com/Samsung/LexSubGen
cd LexSubGen
```
### Setup anaconda environment
1. Download and install [conda](https://conda.io/docs/user-guide/install/download.html)
2. Create new conda environment
```shell script
conda create -n lexsubgen python=3.7.4
```
3. Activate conda environment
```shell script
conda activate lexsubgen
```
4. Install requirements
```shell script
pip install -r requirements.txt
```
5. Download spacy resources and install context2vec and word_forms from github repositories
```shell script
./init.sh
```
### Setup Web Application
**If you do not plan to use the Web Application, skip this section and [go to the next](#13-install-lexsubgen-library)!**
1. Download and install [NodeJS and npm](https://www.npmjs.com/get-npm).
2. Run script for install dependencies and create build files.
```shell script
bash web_app_setup.sh
```
### Install lexsubgen library
```shell script
python setup.py install
```
## Results
Results of the lexical substitution task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.
| Model |
SemEval |
COINCO |
| GAP |
P@1 |
P@3 |
R@10 |
GAP |
P@1 |
P@3 |
R@10 |
| OOC |
44.65 |
16.82 |
12.83 |
18.36 |
46.3 |
19.58 |
15.03 |
12.99 |
| C2V |
55.82 |
7.79 |
5.92 |
11.03 |
48.32 |
8.01 |
6.63 |
7.54 |
| C2V+embs |
53.39 |
28.01 |
21.72 |
33.52 |
50.73 |
29.64 |
24.0 |
21.97 |
| ELMo |
53.66 |
11.58 |
8.55 |
13.88 |
49.47 |
13.58 |
10.86 |
11.35 |
| ELMo+embs |
54.16 |
32.0 |
22.2 |
31.82 |
52.22 |
35.96 |
26.62 |
23.8 |
| BERT |
54.42 |
38.39 |
27.73 |
39.57 |
50.5 |
42.56 |
32.64 |
28.73 |
| BERT+embs |
53.87 |
41.64 |
30.59 |
43.88 |
50.85 |
46.05 |
35.63 |
31.67 |
| RoBERTa |
56.74 |
32.25 |
24.26 |
36.65 |
50.82 |
35.12 |
27.35 |
25.41 |
| RoBERTa+embs |
58.74 |
43.19 |
31.19 |
44.61 |
54.6 |
46.54 |
36.17 |
32.1 |
| XLNet |
59.12 |
31.75 |
22.83 |
34.95 |
53.39 |
38.16 |
28.58 |
26.47 |
| XLNet+embs |
59.62 |
49.53 |
34.9 |
47.51 |
55.63 |
51.5 |
39.92 |
35.12 |
### Results reproduction
Here we list XLNet reproduction commands that correspond
to the results presented in the table above. Reproduction commands for all models you can
find in ```scripts/lexsub-all-models.sh``` Besides saving to the 'run-directory'
all results are saved using mlflow. To check them you can run ```mlflow ui``` in LexSubGen
directory and then open the web page in a browser.
Also you can use pytest to check the reproducibility. But it may take a long time:
```shell script
pytest tests/results_reproduction
```
* #### XLNet:
XLNet Semeval07:
```shell script
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet'
```
XLNet CoInCo:
```shell script
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet'
```
XLNet with embeddings similarity Semeval07:
```shell script
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet_embs'
```
XLNet with embeddings similarity CoInCo:
```shell script
python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet_embs'
```
## Word Sense Induction Results
| Model |
SemEval 2013 |
SemEval 2010 |
| AVG |
AVG |
| XLNet |
33.4 |
52.1 |
| XLNet+embs |
37.3 |
54.1 |
To reproduce these results use 2.3.0 version of transformers and the following command:
```shell script
bash scripts/wsi.sh
```
### Web application
You could use command line interface to run Web application.
```shell script
# Run main server
lexsubgen-app run --host HOST
--port PORT
[--model-configs CONFIGS]
[--start-ids START-IDS]
[--start-all]
[--restore-session]
```
**Example:**
```shell script
# Run server and serve models BERT and XLNet.
# For BERT create server for serving model and substitute generator instantly (load resources in memory).
# For XLNet create only server.
lexsubgen-app run --host '0.0.0.0'
--port 5000
--model-configs '["my_cool_configs/bert.jsonnet", "my_awesome_configs/xlnet.jsonnet"]'
--start-ids '[0]'
# After shutting down server JSON file with session dumps in the '~/.cache/lexsubgen/app_session.json'.
# The content of this file looks like:
# [
# 'my_cool_configs/bert.jsonnet',
# 'my_awesome_configs/xlnet.jsonnet',
# ]
# You can restore it with flag 'restore-session'
lexsubgen-app run --host '0.0.0.0'
--port 5000
--restore-session
# BERT and XLNet restored now
```
##### Arguments:
|Argument |Default|Description |
|-------------------|-------|----------------------------------------------------------------------------------------------|
|`--help` | |Show this help message and exit |
|`--host` | |IP address of running server host |
|`--port` |`5000` |Port for starting the server |
|`--model-configs` |`[]` |List of file paths to the model configs. |
|`--start-ids` |`[]` |Zero-based indices of served models for which substitute generators will be created |
|`--start-all` |`False`|Whether to create substitute generators for all served models |
|`--restore-session`|`False`|Whether to restore session from previous Web application run |
### FAQ
1. How to use gpu? - You can use environment variable CUDA_VISIBLE_DEVICES to use gpu for inference:
```export CUDA_VISIBLE_DEVICES='1'``` or ```CUDA_VISIBLE_DEVICES='1'``` before your command.
1. How to run tests? - You can use pytest: ```pytest tests```