# GenoMAS

**Repository Path**: woodrow_25/GenoMAS

## Basic Information

- **Project Name**: GenoMAS
- **Description**: mirror https://github.com/Liu-Hy/GenoMAS
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-02
- **Last Updated**: 2025-09-02

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Official implementation of the GenoMAS paper:
>[__"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis"__](https://arxiv.org/abs/2507.21035)<br>
>Haoyang Liu, Yijiang Li, Haohan Wang<br>
>UIUC, UC San Diego

[`[Paper]`](https://arxiv.org/abs/2507.21035)  [`[Code]`](https://github.com/Liu-Hy/GenoMAS) 


## What is this?

This repo has two main parts:

1. A minimal multi-agent framework inspired by Anthropic's blog on [building effective agents](https://www.anthropic.com/research/building-effective-agents). As they noted, "Consistently, the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns." Following this idea, we built this framework with just enough encapsulation to make agent experiments easier. The framework provides:
   - A generic multi-agent communication protocol with a typed messaging mechanism for programmatic analysis
   - A Jupyter Notebook-style workflow where agents can plan, write code, execute, debug, and backtrack, to solve tasks in multiple steps
   - Users can define custom agents with specific roles, guidelines, tools, and action units

2. An implementation of [GenoMAS](https://arxiv.org/abs/2507.21035) using this framework for the automated analysis of gene expression datasets. The system 
   takes as input gene datasets downloaded from GEO and TCGA, and analyzes the data to identify significant genes 
   related to a trait, when optionally considering the influence of a condition. There are a total of 1384 (trait, condition) pairs for 
   evaluation on the [GenoTEX](https://arxiv.org/abs/2406.15341) benchmark.

> 📌 **Note**: This repo holds a historical version of GenoMAS that is fully functional and contains its core features. 
> We are testing the code for the current version on different systems and will make it available in 4-5 weeks. 
> Then we will continuously improve it. If you are interested in our work, please star ⭐ this repo to get notified for future updates!

## How to use it?

### 1. Data preparation
Download the input data from the Google Drive [folder](https://drive.google.com/drive/folders/1kxHOyW5wNnY3Rk15xwLaM7ZZS01wGzRO), and save them under the same parent folder. \
You can verify data integrity with:
```bash
cd download
python validator.py --data-dir /path/to/data --validate
```

### 2. Set up environment
Create a conda environment with Python 3.10 and install the required packages:
```bash
conda create -n genomas python=3.10
conda activate genomas
pip install -r requirements.txt
```

### 3. Run the code
Modify `in_data_root` in `main.py` to set input data path on different devices.\
Run an experiment like this:
```bash
python main.py --version 1 --model gemini-2.5-pro --api 1
```

The first time you run this, you'll get an error asking you to set up an API key, e.g. `GOOGLE_API_KEY_1` in a `.env` file. You'll need to get this API key from the LLM provider.

If you type a wrong model name, the error message will show you all the model names that work with this code.

For open-source LLMs, you can either run them locally with [Ollama Python](https://github.com/ollama/ollama-python), or use APIs (add the `--use-api` flag).

## Discussion

For questions/features/discussions, feel free to:
- Open an issue on GitHub

## Citation

If you find our code useful for your research, please cite our paper:

```bibtex
@misc{liu2025genomas,
      title={GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis}, 
      author={Haoyang Liu and Yijiang Li and Haohan Wang},
      year={2025},
      eprint={2507.21035},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.21035}, 
}
```