# GRN
**Repository Path**: ByteDance/GRN
## Basic Information
- **Project Name**: GRN
- **Description**: Generative Refinement Networks for Visual Synthesis
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-05-23
- **Last Updated**: 2026-05-25
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# GRN: Generative Refinement Networks
[](https://arxiv.org/abs/2604.13030)
[](https://bytedance.github.io/GRN/)
[](https://huggingface.co/bytedance-research/GRN)
[](https://huggingface.co/spaces/hanjian/GRN)
[](LICENSE)
[](https://github.com/bytedance/GRN)
---
## 🔥 Updates!!
* May 23, 2026: 🌺 We release the training and evaluation code for HBQ tokenizer, enjoy~
* April 14, 2026: 🤗 Paper and code release
## 📋 Table of Contents
- [🌟 Introduction](#-introduction)
- [✨ Gallery](#-gallery)
- [🚀 Demo](#-demo)
- [📦 Model Zoo](#-model-zoo)
- [🛠️ Installation](#️-installation)
- [🖼️ Class-to-Image](#️-class-to-image)
- [Dataset](#dataset)
- [Training](#training)
- [Evaluation](#evaluation)
- [🎨 Text-to-Image](#-text-to-image)
- [Inference](#inference)
- [🎬 Text-to-Video](#-text-to-video)
- [Inference](#inference-1)
- [📦 HBQ Tokenizer](#-hbq-tokenizer)
- [Data](#data)
- [Training](#training-1)
- [Evaluation](#evaluation-1)
- [📧 Contact](#-contact)
- [🤗 Acknowledgements](#-acknowledgements)
- [📝 Citation](#-citation)
---
## 🌟 Introduction
This is the official implementation of the paper **Generative Refinement Networks for Visual Synthesis**. Neither diffusion nor autoregressive — GRN is a third way. 🧠 Refines globally like an artist. ⚡ Generates adaptively by complexity. 🏆 New SOTA across image & video. The visual generation paradigm just got rewritten.
Diffusion models dominate visual generation but they allocate uniform computational effort to samples with varying levels of complexity. Autoregressive (AR) models are complexity-aware, as evidenced by their variable likelihoods, but suffer from lossy tokenization and error accumulation.
We introduce **Generative Refinement Networks (GRN)**, a new visual synthesis paradigm that addresses these issues:
- **Near-lossless tokenization** via Hierarchical Binary Quantization (HBQ)
- **Global refinement mechanism** that progressively perfects outputs like a human artist
- **Entropy-guided sampling** for complexity-aware, adaptive-step generation
GRN achieves state-of-the-art results on ImageNet reconstruction and class-conditional generation, and scales effectively to text-to-image and text-to-video tasks.
---
Generative Refinement Framework
Starting from a random token map, GRN randomly selects more predictions at each step and refines all input tokens. For example, compared to the second step, the third step filled six new tokens (pink), kept two tokens (blue), erased two tokens (yellow), and left six tokens blank (gray).
### GRN-2B Class-to-Image Examples
### GRN-2B Text-to-Image Examples
---
## 🚀 Demo
### 🖼️ Text-to-Image
Try our interactive Text-to-Image demo on 🤗 Hugging Face Space:
**[GRN T2I Demo](https://huggingface.co/spaces/hanjian/GRN)**
Experience the power of Generative Refinement Networks firsthand by generating images from text prompts directly in your browser!
---
### 🎬 Text-to-Video
Try our interactive Text-to-Video demo on Discord:
[](http://opensource.bytedance.com/discord/invite)
T2V Demo on Discord
---
## 📦 Model Zoo
| Model | Checkpoints |
|-------|:-----------:|
| **Tokenizers** | ✅ [ImageNet Tokenizer](https://huggingface.co/bytedance-research/GRN/blob/main/HBQ_image_tokenizer_16dim_M4.ckpt) ✅ [Joint Image/Video Tokenizer](https://huggingface.co/bytedance-research/GRN/blob/main/HBQ_tokenizer_64dim_M4.ckpt) |
| **GRN_ind_C2I** | ✅ [B](https://huggingface.co/bytedance-research/GRN/blob/main/GRN_ind_B_ep599.pth) ⬜ L (TBD) ⬜ H (TBD) ⬜ G (TBD) |
| **GRN_bit_T2I** | ✅ [GRN_T2I](https://huggingface.co/bytedance-research/GRN/blob/main/GRN_T2I_2B.pth) |
| **GRN_bit_T2V** | ✅ [GRN_T2V](https://huggingface.co/bytedance-research/GRN/blob/main/GRN_T2V_2B.pth) |
---
## 🛠️ Installation
### Step 1: Clone the repository
```bash
git clone https://github.com/bytedance/GRN
cd GRN
```
### Step 2: Create conda environment
A suitable [conda](https://conda.io/) environment named `GRN` can be created and activated with:
```bash
conda env create -f environment.yaml
conda activate GRN
```
### Troubleshooting
If you get `undefined symbol: iJIT_NotifyEvent` when importing `torch`, simply:
```bash
pip uninstall torch
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
```
Check this [issue](https://github.com/conda/conda/issues/13812#issuecomment-2071445372) for more details.
---
## 🖼️ Class-to-Image
### Dataset
Download [ImageNet](http://image-net.org/download) dataset, and place it in your `IMAGENET_PATH`.
### Training
All training scripts are located in `scripts/c2i/`. We suggest using 8x80GB GPUs for most models.
| Model | Training Script | GPUs Required |
|-------|:-------------:|:-------------:|
| GRN_ind_B | `bash scripts/c2i/train_GRN_ind_B.sh` | 8x80GB |
| GRN_bit_B | `bash scripts/c2i/train_GRN_bit_B.sh` | 8x80GB |
| GRN_ind_L | `bash scripts/c2i/train_GRN_ind_L.sh` | 8x80GB |
| GRN_ind_H | `bash scripts/c2i/train_GRN_ind_H.sh` | 16x80GB |
| GRN_ind_G | `bash scripts/c2i/train_GRN_ind_G.sh` | 32x80GB |
### Evaluation
PyTorch pre-trained models are available [here](https://huggingface.co/bytedance-research/GRN/tree/main).
All evaluation scripts are located in `scripts/c2i/`. We suggest using 8x80GB vRAM GPUs.
| Model | Evaluation Script |
|-------|:--------------:|
| GRN_ind_B | `bash scripts/c2i/eval_GRN_ind_B.sh` |
| GRN_bit_B | `bash scripts/c2i/eval_GRN_bit_B.sh` |
| GRN_ind_L | `bash scripts/c2i/eval_GRN_ind_L.sh` |
| GRN_ind_H | `bash scripts/c2i/eval_GRN_ind_H.sh` |
| GRN_ind_G | `bash scripts/c2i/eval_GRN_ind_G.sh` |
We use [torch-fidelity](https://github.com/LTH14/torch-fidelity) to evaluate FID and IS against a reference image folder or statistics. We use the JiT's pre-computed reference stats under `grn/utils_c2i/fid_stats`.
---
## 🎨 Text-to-Image
### Inference
You can simply run `python3 t2i_infer.py` or use the following code:
```python
from PIL import Image
from grn_pipeline import GRNPipeline
# Load pipeline
pipeline = GRNPipeline.from_pretrained(
hf_repo_id='bytedance-research/GRN',
task='T2I',
pn='1M',
device='cpu',
).to('cuda')
# Generate one image
result = pipeline(
prompt="A cute cat playing in the garden",
guidance_scale=3.0,
temperature=1.1,
complexity_aware_Tmin=10,
complexity_aware_Tmax=50,
complexity_aware_k = 0,
complexity_aware_b = 50,
complexity_aware_wp = 5,
snr_shift = 1.,
h_div_w=1.,
content_type='image',
seed=42,
)
image = result.images[0]
image.save('./generated_image.jpg')
```
---
## 🎬 Text-to-Video
### Inference
You can simply run `python3 t2v_infer.py` or use the following code:
```python
from grn_pipeline import GRNPipeline
# Load pipeline
pipeline = GRNPipeline.from_pretrained(
hf_repo_id='bytedance-research/GRN',
task='T2V',
pn='0.41M',
device='cpu'
).to('cuda')
# Generate one video
result = pipeline(
prompt="Two women demonstrate a makeup product, applying it with a sponge while smiling and engaging with the camera in a bright, clean setting.",
guidance_scale=4.0,
temperature=1.0,
complexity_aware_Tmin=10,
complexity_aware_Tmax=50,
complexity_aware_k = 0,
complexity_aware_b = 50,
complexity_aware_wp = 5,
snr_shift = 1.,
h_div_w=9/16,
duration=2.,
content_type='video',
seed=42,
)
video_file = result.videos[0]
```
---
## 📦 HBQ Tokenizer
### Data
Image Dataset, e.g., data_root/username/labels/imagenet/train.txt:
```
[image_1_full_path]
[image_2_full_path]
[image_3_full_path]
...
```
Video Dataset, e.g., data_root/username/labels_hanjian/high-quality-video/horizontal_videos.txt
```
[video_1_full_path]
[video_2_full_path]
[video_3_full_path]
...
```
### Training
For example, set `latent_channels=16/64` and `quant_method=hierarchical_binary_quant_round_4` in `scripts/hbq_tokenizer_train.sh`, then run:
```bash
cd grn/tokenizer
bash scripts/hbq_tokenizer_train.sh
```
### Evaluation
For example, set `latent_channels=16/64` and `quant_method=hierarchical_binary_quant_round_4` in `scripts/hbq_tokenizer_train.sh`, then run:
```bash
cd grn/tokenizer
bash scripts/hbq_tokenizer_eval.sh
```
---
## 📧 Contact
If you are interested in scaling GRN for image generation / image editing / video generation / video editing / unified model directions, please feel free to reach out!
**📧 Email:** [hanjian.thu123@bytedance.com](mailto:hanjian.thu123@bytedance.com)
---
## 🤗 Acknowledgements
- Thanks to [JiT](https://github.com/LTH14/JiT), [Infinity](https://github.com/FoundationVision/Infinity) and [InfinityStar](https://github.com/FoundationVision/InfinityStar) for their wonderful work and codebase!
---
## 📝 Citation
If you find our work useful, please consider citing:
```bibtex
@misc{han2026grn,
title={Generative Refinement Networks for Visual Synthesis},
author={Jian Han and Jinlai Liu and Jiahuan Wang and Bingyue Peng and Zehuan Yuan},
year={2026},
eprint={2604.13030},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.13030},
}
```