# MORE2

**Repository Path**: mario1316/MORE2

## Basic Information

- **Project Name**: MORE2
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-01-13
- **Last Updated**: 2021-01-13

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# **MORE** : a **M**etric learning based framework for **O**pen-domain **R**elation **E**xtraction

This repository is built for the source code of paper -- **MORE: A METRIC LEARNING BASED FRAMEWORK FOR OPEN-DOMAIN RELATION EXTRACTION** . You can follow the steps below to use our code.

## 1. Configure the Environment

To run the code, you need ：

~~~bash
torch>=1.3.0
tensorflow>=1.9.0
keras>=2.2.5
transformers==3.2.0
~~~

You can run the following command to set up a new Anaconda environment:

```bash
conda create -n more python=3.6
pip install -r ./requirements.txt
```

We suggest that you use the same environment as ours to avoid any problems.

## 2. Prepare the Datasets

In this code, we use two real-world RE datasets:

- **FewRel** :  We follow [RSNs](https://github.com/thunlp/RSN). The processed dataset is already in ./data/datasets/fewrel_ori/ .

- **NYT+FB-sup**: We use the original NYT+FB and process it to NYT+FB-sup. The dataset is not open source, but you can get the [sample](https://github.com/diegma/relation-autoencoder/blob/master/data-sample.txt) if you need.  

  To process **nyt_ori.txt** (suppose you already own it and store it in the ./data/datasets/nyt_su/ ), run the following command:

  ```bash
  python ./data/datasets/nyt_su/process2json.py
  python ./data/datasets/nyt_su/nyt_divide_supervision.py
  ```

  then the original **.txt** file will be processed into **.json** format and be divided into train\dev\test(6:2:2).

## 3. Run it

In our experiments, we use CNN and BERT for our extractor. The architecture of CNN is same as [RSNs](https://github.com/thunlp/RSN) used, and the pre-trained language model we exploit is [huggingface transformers](https://huggingface.co/transformers/model_doc/bert.html).

- **On FewRel**:

  - CNN

  ```bash
  python main_cmd.py --dataset fewrel 
  ```

  - CNN+VAT

  ```bash
  python main_cmd.py --dataset fewrel --VAT 1 --epoch_num 4 --warm_up 3 --power_iterations 1 --p_mult 0.03 --lambda_V 1 
  ```

  - BERT

  ```bash
  python main_cmd.py --dataset fewrel --learning_rate 0.00001 --batch_num 1000 --BERT 1 
  ```

- **On NYT+FB-sup**:

  - CNN

  ```bash
  python main_cmd.py --dataset nyt
  ```

  - CNN+VAT

  ```bash
  python main_cmd.py --dataset nyt --VAT 1 --epoch_num 6 --warm_up 4 --power_iterations 1 --p_mult 0.5 --lambda_V 1.5
  ```

  - BERT

  ```bash
  python main_cmd.py --dataset nyt --learning_rate 0.00001 --batch_num 1000 --BERT 1 
  ```

*Note that if you have enough computing resources, you can try to use **MORE(BERT)+VAT** (We didn't list this result on paper  due to the limitation of GPU memory) :*

```bash
python main_cmd.py --dataset fewrel --VAT 1 --epoch_num 4 --warm_up 0 --power_iterations 1 --p_mult 0.03 --lambda_V 1 --learning_rate 0.00001 --batch_num 1000 --BERT 1
python main_cmd.py --dataset nyt --VAT 1 --epoch_num 4 --warm_up 0 --power_iterations 1 --p_mult 0.5 --lambda_V 1.5 --learning_rate 0.00001 --batch_num 1000 --BERT 1
```

## 4. Future Work

- Improve virtual adversarial training.
- Apply virtual adversarial training on MORE(BERT).