# CSEM **Repository Path**: wangclnlp/CSEM ## Basic Information - **Project Name**: CSEM - **Description**: Code for "Learning Evaluation Models from Large Language Models for Sequence Generation" - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-06-10 - **Last Updated**: 2025-06-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Customized Sequence Evaluation Metric (CSEM) This repository contains the code and released models for our paper [Learning Evaluation Models from Large Language Models for Sequence Generation 📝](https://arxiv.org/abs/2308.04386). We propose **CSEM (Customized Sequence Evaluation Metric)**, a three-stage training framework that leverages large language models to automatically generate labeled data for training evaluation metrics, thus eliminating reliance on human annotations. CSEM supports diverse evaluation settings, including single-aspect, multi-aspect, reference-based, and reference-free, enabling flexible and effective assessment of sequence generation across varied scenarios.

## Installation Guide The code of this repo is modified from [Unbabel/COMET](https://github.com/Unbabel/COMET) 🌹🌹🌹. If you encounter installation issues (e.g., related to PyTorch or CUDA), we recommend first checking the COMET [issues](https://github.com/Unbabel/COMET/issues) for potential solutions. If the problem persists, please feel free to submit an issue in this repository. ```bash git clone https://gitee.com/wangclnlp/CSEM cd CSEM pip install poetry poetry install ``` ## Preparing Datasets ### Training a Generative Language Model You can train the model with [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq) or any project traning large language models. ### Sampling from the Generative Language Model and Labeling the Data Prepare queries, corresponding answers, and responses from generative models, then label the responses with specified template (take "Single-aspect Evaluation for Machine Translation" as an example): ```text Based on the human reference, score the following translation from [Source Language] to [Target Language] with respect to [Aspect] with one to five stars, where one star means [Description of the Worst Translation on a Single Aspect] and five stars mean [Description of the Perfect Translation on a Single Aspect]. Note that [Definition of the Used Single Evaluation Aspect]. [Source Language] source: [Source] [Target Language] human reference: [Reference] [Target Language] translation: [Translation] Stars: ``` ### Post-processing Data The data should be in csv format with different columns depending on whether there is reference or not. #### Data w/ Reference The columns include `src`, `mt` `ref` and `score`. Example:
src und wieder ins haus zurück bringen.
mt then they had to bring them in.
ref putting back in the house.
score 2
#### Data w/o Reference The columns include `src`, `mt` and `score`. Example:
src das ist sehr praktisch und extrem toll.
mt this is very practical and extremely awesome.
score 4
## Training Scripts Training arguments are managed in yaml format in subdirectory `configs/`, after configuring arguments in the config file, you can train the model-based metric with following command: ```bash python comet/cli/train.py --cfg /path/to/config/file ``` - Training w/ Reference The example config file for training without reference is located at [`configs/completeness_diff_train_size/reference_model.yaml`](configs/completeness_diff_train_size/reference_model.yaml). - Training w/o Reference The example config file for training without reference is located at [`configs/coherence_diff_train_size/referenceless_model.yaml`](configs/coherence_diff_train_size/referenceless_model.yaml). ## Citation ```bash @misc{learning2025wang, title={Learning Evaluation Models from Large Language Models for Sequence Generation}, author={Chenglong Wang and Hang Zhou and Kaiyan Chang and Tongran Liu and Chunliang Zhang and Quan Du and Tong Xiao and Yue Zhang and Jingbo Zhu}, year={2025}, eprint={2308.04386}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2308.04386}, } ```