# TPRL

**Repository Path**: lenghong/TPRL

## Basic Information

- **Project Name**: TPRL
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-03-29
- **Last Updated**: 2026-03-29

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)

A reinforcement-learning-based visual token pruning framework to accelerate inference of Large Vision Language Models (LVLMs).


## 📋 Method Overview


TPRL formulates visual token pruning as a Markov Decision Process (MDP):

1. **Learning from Demonstrations (LfD)**: Generate demonstration trajectories using heuristics and pretrain the policy network.
2. **PPO Fine-tuning**: Fine-tune the policy with Proximal Policy Optimization to jointly optimize task performance and computational efficiency.
3. **Inference**: One-shot pruning that retains the most important visual tokens.

### Architecture

```
visual input → ViT → Projector → [TPRL pruner] → LLM → output
```

## 🚀 Quick Start

### Installation

```bash
# Clone the repository
git clone https://github.com/MagicVicCoder/TPRL.git
cd TPRL

# Install requirements
pip install -r requirements.txt
```

### Training

#### Step 1: Learning from Demonstrations

```bash
python train_lfd.py
```

#### Step 2: PPO Training

```bash
# Set the LfD checkpoint path in config.py first
python train_ppo.py
```

### Evaluation

```bash
python main.py
```

## 📁 Project Structure

```
TPRL/
├── model/
│   ├── autoencoder.py      # Token compression (optional)
│   ├── rl_networks.py      # Policy and value networks
│   ├── llava_mllm.py       # LLaVA model wrapper
│   └── qwen_mllm.py        # Qwen model wrapper
├── pruner/
│   ├── rl_pruner.py        # RL-based pruner
│   ├── random_pruner.py    # Baseline random pruner
│   └── mlp_pruner.py       # MLP-based pruner
├── train_lfd.py            # LfD training script
├── train_ppo.py            # PPO training script
├── config.py               # Configuration
└── main.py                 # Evaluation / inference script
```

## 🎯 Core Idea

### MDP Formulation

* **State**: (visual tokens, text query)
* **Action**: keep / prune decision for each token
* **Reward**: downstream task performance + computational efficiency

### Reward Function

```python
reward = alpha * task_reward + beta * efficiency_reward
```

* `task_reward`: change in task performance (e.g., IoU / accuracy)
* `efficiency_reward`: compression / efficiency metric

## 🛠️ Requirements

* Python >= 3.8
* PyTorch >= 2.0
* Transformers >= 4.37.0
* See `requirements.txt` for full dependency list

---

⭐ If you find this repository useful, please give it a Star!

## 📄 Citation

If you find this work useful, please cite:

```bibtex
@misc{cao2026languageguidedtokencompressionreinforcement,
  title={Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models},
  author={Sihan Cao and Jianwei Zhang and Pengcheng Zheng and Jiaxin Yan and Caiyan Qin and Yalan Ye and Wei Dong and Peng Wang and Yang Yang and Chaoning Zhang},
  year={2026},
  eprint={2603.13394},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2603.13394}
}