# conda **Repository Path**: semikonductor/conda ## Basic Information - **Project Name**: conda - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-07-25 - **Last Updated**: 2024-07-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ConDA-gen-text-detection Code for the paper: **ConDA: Contrastive Domain Adaptation for AI-generated Text Detection** accepted at IJCNLP-AACL 2023 [paper link](https://arxiv.org/abs/2309.03992). ### :star2: Great News! [Nov 4, 2023] :star2: Our paper won the **Outstanding Paper Award** at IJCNLP-AACL 2023 held in Bali, Indonesia. ![ConDA Framework Diagram](https://github.com/AmritaBh/ConDA-gen-text-detection/blob/main/conda-framework.jpg) ## Setup Set up a separate environment and install requirements via `pip install -r requirements.txt` Make directories for the models, output logs and huggingface model files. `mkdir models huggingface_repos output_logs` Download `roberta-base` from [here](https://huggingface.co/roberta-base/tree/main) and/or `roberta-large` from [here](https://huggingface.co/roberta-large/tree/main) and place these repositories in `huggingface_repos`. `contrast_training_with_da.py` is the ConDA training script. The `multi_domain_runner.py` is the runner script for training ConDA models. Update the arguments in `multi_domain_runner.py` to train models as needed. Use the `evaluation.py` script for evaluating models. Change arguments within the `evaluation.py` script as needed. ## TuringBench Link to the dataset website: [link](https://turingbench.ist.psu.edu/) Link to the TuringBench paper: [link](https://arxiv.org/abs/2109.13296) Files should be split into 3 jsonl splits: train, valid, test. Each line in the jsonl is a data instance with `text` and `label` fields. ## Links to best performing models for each target generator Here we provide links to pre-trained ConDA models for the best performing models: | Target | Best performing source | Dropbox Link | | :-----------: | :-----------: | :-----: | | CTRL | GROVER_mega | [link](https://www.dropbox.com/s/h5prhx3j4yndoig/grover_mega_ctrl_syn_rep_loss1.pt?dl=0) | | FAIR_wmt19 | GPT2_xl | [link](https://www.dropbox.com/s/h36fh24qu9203pf/gpt2_xl_fair_wmt19_syn_rep_loss1.pt?dl=0) | | GPT2_xl | FAIR_wmt19 | [link](https://www.dropbox.com/s/mnx5lyg4geebhm6/fair_wmt19_gpt2_xl_syn_rep_loss1.pt?dl=0) | | GPT3 | GROVER_mega | [link](https://www.dropbox.com/s/mh09c8kdinocsz9/grover_mega_gpt3_syn_rep_loss1.pt?dl=0) | | GROVER_mega | CTRL | [link](https://www.dropbox.com/s/o0fs8dodywvuda0/ctrl_grover_mega_syn_rep_loss1.pt?dl=0) | | XLM | GROVER_mega | [link](https://www.dropbox.com/s/q6ddq2aop9qw8lo/grover_mega_xlm_syn_rep_loss1.pt?dl=0) | | ChatGPT | FAIR_wmt19 | [link](https://www.dropbox.com/s/sgwiucl1x7p7xsx/fair_wmt19_chatgpt_syn_rep_loss1.pt?dl=0) | # Citation If you use (part of) this code, please cite our paper as: ``` @InProceedings{bhattacharjee-EtAl:2023:ijcnlp, author = {Bhattacharjee, Amrita and Kumarage, Tharindu and Moraffah, Raha and Liu, Huan}, title = {ConDA: Contrastive Domain Adaptation for AI-generated Text Detection}, booktitle = {Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics}, month = {November}, year = {2023}, address = {Nusa Dua, Bali}, publisher = {Association for Computational Linguistics}, pages = {598--610}, url = {https://aclanthology.org/2023.ijcnlp-long.40} } ``` # Contact For any questions, comments, and feedback, contact Amrita Bhattacharjee at abhatt43@asu.edu