# CSEM
**Repository Path**: wangclnlp/CSEM
## Basic Information
- **Project Name**: CSEM
- **Description**: Code for "Learning Evaluation Models from Large Language Models for Sequence Generation"
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-06-10
- **Last Updated**: 2025-06-10
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Customized Sequence Evaluation Metric (CSEM)
This repository contains the code and released models for our paper [Learning Evaluation Models from Large Language Models for Sequence Generation 📝](https://arxiv.org/abs/2308.04386). We propose **CSEM (Customized Sequence Evaluation Metric)**, a three-stage training framework that leverages large language models to automatically generate labeled data for training evaluation metrics, thus eliminating reliance on human annotations. CSEM supports diverse evaluation settings, including single-aspect, multi-aspect, reference-based, and reference-free, enabling flexible and effective assessment of sequence generation across varied scenarios.
## Installation Guide
The code of this repo is modified from [Unbabel/COMET](https://github.com/Unbabel/COMET) 🌹🌹🌹. If you encounter installation issues (e.g., related to PyTorch or CUDA), we recommend first checking the COMET [issues](https://github.com/Unbabel/COMET/issues) for potential solutions. If the problem persists, please feel free to submit an issue in this repository.
```bash
git clone https://gitee.com/wangclnlp/CSEM
cd CSEM
pip install poetry
poetry install
```
## Preparing Datasets
### Training a Generative Language Model
You can train the model with [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq) or any project traning large language models.
### Sampling from the Generative Language Model and Labeling the Data
Prepare queries, corresponding answers, and responses from generative models, then label the responses with specified template (take "Single-aspect Evaluation for Machine Translation" as an example):
```text
Based on the human reference, score the following translation from [Source Language] to [Target Language] with respect to [Aspect] with one to five stars, where one star means [Description of the Worst Translation on a Single Aspect] and five stars mean [Description of the Perfect Translation on a Single Aspect].
Note that [Definition of the Used Single Evaluation Aspect].
[Source Language] source: [Source]
[Target Language] human reference: [Reference]
[Target Language] translation: [Translation]
Stars:
```
### Post-processing Data
The data should be in csv format with different columns depending on whether there is reference or not.
#### Data w/ Reference
The columns include `src`, `mt` `ref` and `score`.
Example:
| src |
und wieder ins haus zurück bringen. |
| mt |
then they had to bring them in. |
| ref |
putting back in the house. |
| score |
2 |
#### Data w/o Reference
The columns include `src`, `mt` and `score`.
Example:
| src |
das ist sehr praktisch und extrem toll. |
| mt |
this is very practical and extremely awesome. |
| score |
4 |
## Training Scripts
Training arguments are managed in yaml format in subdirectory `configs/`, after configuring arguments in the config file, you can train the model-based metric with following command:
```bash
python comet/cli/train.py --cfg /path/to/config/file
```
- Training w/ Reference
The example config file for training without reference is located at [`configs/completeness_diff_train_size/reference_model.yaml`](configs/completeness_diff_train_size/reference_model.yaml).
- Training w/o Reference
The example config file for training without reference is located at [`configs/coherence_diff_train_size/referenceless_model.yaml`](configs/coherence_diff_train_size/referenceless_model.yaml).
## Citation
```bash
@misc{learning2025wang,
title={Learning Evaluation Models from Large Language Models for Sequence Generation},
author={Chenglong Wang and Hang Zhou and Kaiyan Chang and Tongran Liu and Chunliang Zhang and Quan Du and Tong Xiao and Yue Zhang and Jingbo Zhu},
year={2025},
eprint={2308.04386},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2308.04386},
}
```