# CSEM

**Repository Path**: wangclnlp/CSEM

## Basic Information

- **Project Name**: CSEM
- **Description**: Code for "Learning Evaluation Models from Large Language Models for Sequence Generation"
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-06-10
- **Last Updated**: 2025-06-10

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Customized Sequence Evaluation Metric (CSEM)

This repository contains the code and released models for our paper [Learning Evaluation Models from Large Language Models for Sequence Generation 📝](https://arxiv.org/abs/2308.04386). We propose **CSEM (Customized Sequence Evaluation Metric)**, a three-stage training framework that leverages large language models to automatically generate labeled data for training evaluation metrics, thus eliminating reliance on human annotations. CSEM supports diverse evaluation settings, including single-aspect, multi-aspect, reference-based, and reference-free, enabling flexible and effective assessment of sequence generation across varied scenarios.

<p align="center">
  <img src="./csem.jpg" width="400px" />
</p>


## Installation Guide

The code of this repo is modified from [Unbabel/COMET](https://github.com/Unbabel/COMET) 🌹🌹🌹. If you encounter installation issues (e.g., related to PyTorch or CUDA), we recommend first checking the COMET [issues](https://github.com/Unbabel/COMET/issues) for potential solutions. If the problem persists, please feel free to submit an issue in this repository.

```bash
git clone https://gitee.com/wangclnlp/CSEM
cd CSEM
pip install poetry
poetry install
```

## Preparing Datasets

### Training a Generative Language Model

You can train the model with [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq) or any project traning large language models.

### Sampling from the Generative Language Model and Labeling the Data

Prepare queries, corresponding answers, and responses from generative models, then label the responses with specified template (take "Single-aspect Evaluation for Machine Translation" as an example):

```text
Based on the human reference, score the following translation from [Source Language] to [Target Language] with respect to [Aspect] with one to five stars, where one star means [Description of the Worst Translation on a Single Aspect] and five stars mean [Description of the Perfect Translation on a Single Aspect].
Note that [Definition of the Used Single Evaluation Aspect].

[Source Language] source: [Source]
[Target Language] human reference: [Reference]
[Target Language] translation: [Translation]
Stars:
```

### Post-processing Data

The data should be in csv format with different columns depending on whether there is reference or not.

#### Data w/ Reference

The columns include `src`, `mt` `ref` and `score`.

Example:

<table><tbody>
  <tr>
    <td>src</td>
    <td>und wieder ins haus zurück bringen.</td>
  </tr>
  <tr>
    <td>mt</td>
    <td>then they had to bring them in.</td>
  </tr>
  <tr>
    <td>ref</td>
    <td>putting back in the house.</td>
  </tr>
  <tr>
    <td>score</td>
    <td>2</td>
  </tr>
</tbody>
</table>

#### Data w/o Reference

The columns include `src`, `mt` and `score`.

Example:

<table><tbody>
  <tr>
    <td>src</td>
    <td>das ist sehr praktisch und extrem toll.</td>
  </tr>
  <tr>
    <td>mt</td>
    <td>this is very practical and extremely awesome.</td>
  </tr>
  <tr>
    <td>score</td>
    <td>4</td>
  </tr>
</tbody>
</table>


## Training Scripts

Training arguments are managed in yaml format in subdirectory `configs/`, after configuring arguments in the config file, you can train the model-based metric with following command:

```bash
python comet/cli/train.py --cfg /path/to/config/file
```

- Training w/ Reference

  The example config file for training without reference is located at [`configs/completeness_diff_train_size/reference_model.yaml`](configs/completeness_diff_train_size/reference_model.yaml).

- Training w/o Reference

  The example config file for training without reference is located at [`configs/coherence_diff_train_size/referenceless_model.yaml`](configs/coherence_diff_train_size/referenceless_model.yaml).

## Citation
```bash
@misc{learning2025wang,
      title={Learning Evaluation Models from Large Language Models for Sequence Generation}, 
      author={Chenglong Wang and Hang Zhou and Kaiyan Chang and Tongran Liu and Chunliang Zhang and Quan Du and Tong Xiao and Yue Zhang and Jingbo Zhu},
      year={2025},
      eprint={2308.04386},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2308.04386}, 
}
```
src	und wieder ins haus zurück bringen.
mt	then they had to bring them in.
ref	putting back in the house.
score	2
src	das ist sehr praktisch und extrem toll.
mt	this is very practical and extremely awesome.
score	4