# GRN **Repository Path**: ByteDance/GRN ## Basic Information - **Project Name**: GRN - **Description**: Generative Refinement Networks for Visual Synthesis - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-05-23 - **Last Updated**: 2026-05-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # GRN: Generative Refinement Networks [![arXiv](https://img.shields.io/badge/arXiv%20paper-2604.13030-b31b1b.svg)](https://arxiv.org/abs/2604.13030) [![Homepage](https://img.shields.io/badge/🏠%20Homepage-GRN-green.svg)](https://bytedance.github.io/GRN/) [![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-blue.svg)](https://huggingface.co/bytedance-research/GRN) [![Demo](https://img.shields.io/badge/🤗%20Hugging%20Face-Demo-yellow.svg)](https://huggingface.co/spaces/hanjian/GRN) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) [![GitHub stars](https://img.shields.io/github/stars/bytedance/GRN?style=social)](https://github.com/bytedance/GRN) --- ## 🔥 Updates!! * May 23, 2026: 🌺 We release the training and evaluation code for HBQ tokenizer, enjoy~ * April 14, 2026: 🤗 Paper and code release ## 📋 Table of Contents - [🌟 Introduction](#-introduction) - [✨ Gallery](#-gallery) - [🚀 Demo](#-demo) - [📦 Model Zoo](#-model-zoo) - [🛠️ Installation](#️-installation) - [🖼️ Class-to-Image](#️-class-to-image) - [Dataset](#dataset) - [Training](#training) - [Evaluation](#evaluation) - [🎨 Text-to-Image](#-text-to-image) - [Inference](#inference) - [🎬 Text-to-Video](#-text-to-video) - [Inference](#inference-1) - [📦 HBQ Tokenizer](#-hbq-tokenizer) - [Data](#data) - [Training](#training-1) - [Evaluation](#evaluation-1) - [📧 Contact](#-contact) - [🤗 Acknowledgements](#-acknowledgements) - [📝 Citation](#-citation) --- ## 🌟 Introduction This is the official implementation of the paper **Generative Refinement Networks for Visual Synthesis**. Neither diffusion nor autoregressive — GRN is a third way. 🧠 Refines globally like an artist. ⚡ Generates adaptively by complexity. 🏆 New SOTA across image & video. The visual generation paradigm just got rewritten. Diffusion models dominate visual generation but they allocate uniform computational effort to samples with varying levels of complexity. Autoregressive (AR) models are complexity-aware, as evidenced by their variable likelihoods, but suffer from lossy tokenization and error accumulation. We introduce **Generative Refinement Networks (GRN)**, a new visual synthesis paradigm that addresses these issues: - **Near-lossless tokenization** via Hierarchical Binary Quantization (HBQ) - **Global refinement mechanism** that progressively perfects outputs like a human artist - **Entropy-guided sampling** for complexity-aware, adaptive-step generation GRN achieves state-of-the-art results on ImageNet reconstruction and class-conditional generation, and scales effectively to text-to-image and text-to-video tasks. ---
Generative Refinement Framework
Framework

Starting from a random token map, GRN randomly selects more predictions at each step and refines all input tokens. For example, compared to the second step, the third step filled six new tokens (pink), kept two tokens (blue), erased two tokens (yellow), and left six tokens blank (gray).

--- ## ✨ Gallery ### GRN-8B Text-to-Video Examples
--- ### GRN-8B Image-to-Video Examples
### GRN-2B Class-to-Image Examples
Class-to-Image Examples
### GRN-2B Text-to-Image Examples
Text-to-Image Examples
--- ## 🚀 Demo ### 🖼️ Text-to-Image Try our interactive Text-to-Image demo on 🤗 Hugging Face Space: **[GRN T2I Demo](https://huggingface.co/spaces/hanjian/GRN)** Experience the power of Generative Refinement Networks firsthand by generating images from text prompts directly in your browser! --- ### 🎬 Text-to-Video Try our interactive Text-to-Video demo on Discord: [![Discord](https://img.shields.io/badge/Discord-Join%20Server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](http://opensource.bytedance.com/discord/invite)
T2V Demo on Discord
T2V Demo
--- ## 📦 Model Zoo | Model | Checkpoints | |-------|:-----------:| | **Tokenizers** | ✅ [ImageNet Tokenizer](https://huggingface.co/bytedance-research/GRN/blob/main/HBQ_image_tokenizer_16dim_M4.ckpt)
✅ [Joint Image/Video Tokenizer](https://huggingface.co/bytedance-research/GRN/blob/main/HBQ_tokenizer_64dim_M4.ckpt) | | **GRN_ind_C2I** | ✅ [B](https://huggingface.co/bytedance-research/GRN/blob/main/GRN_ind_B_ep599.pth)
⬜ L (TBD)
⬜ H (TBD)
⬜ G (TBD) | | **GRN_bit_T2I** | ✅ [GRN_T2I](https://huggingface.co/bytedance-research/GRN/blob/main/GRN_T2I_2B.pth) | | **GRN_bit_T2V** | ✅ [GRN_T2V](https://huggingface.co/bytedance-research/GRN/blob/main/GRN_T2V_2B.pth) | --- ## 🛠️ Installation ### Step 1: Clone the repository ```bash git clone https://github.com/bytedance/GRN cd GRN ``` ### Step 2: Create conda environment A suitable [conda](https://conda.io/) environment named `GRN` can be created and activated with: ```bash conda env create -f environment.yaml conda activate GRN ``` ### Troubleshooting If you get `undefined symbol: iJIT_NotifyEvent` when importing `torch`, simply: ```bash pip uninstall torch pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124 ``` Check this [issue](https://github.com/conda/conda/issues/13812#issuecomment-2071445372) for more details. --- ## 🖼️ Class-to-Image ### Dataset Download [ImageNet](http://image-net.org/download) dataset, and place it in your `IMAGENET_PATH`. ### Training All training scripts are located in `scripts/c2i/`. We suggest using 8x80GB GPUs for most models. | Model | Training Script | GPUs Required | |-------|:-------------:|:-------------:| | GRN_ind_B | `bash scripts/c2i/train_GRN_ind_B.sh` | 8x80GB | | GRN_bit_B | `bash scripts/c2i/train_GRN_bit_B.sh` | 8x80GB | | GRN_ind_L | `bash scripts/c2i/train_GRN_ind_L.sh` | 8x80GB | | GRN_ind_H | `bash scripts/c2i/train_GRN_ind_H.sh` | 16x80GB | | GRN_ind_G | `bash scripts/c2i/train_GRN_ind_G.sh` | 32x80GB | ### Evaluation PyTorch pre-trained models are available [here](https://huggingface.co/bytedance-research/GRN/tree/main). All evaluation scripts are located in `scripts/c2i/`. We suggest using 8x80GB vRAM GPUs. | Model | Evaluation Script | |-------|:--------------:| | GRN_ind_B | `bash scripts/c2i/eval_GRN_ind_B.sh` | | GRN_bit_B | `bash scripts/c2i/eval_GRN_bit_B.sh` | | GRN_ind_L | `bash scripts/c2i/eval_GRN_ind_L.sh` | | GRN_ind_H | `bash scripts/c2i/eval_GRN_ind_H.sh` | | GRN_ind_G | `bash scripts/c2i/eval_GRN_ind_G.sh` | We use [torch-fidelity](https://github.com/LTH14/torch-fidelity) to evaluate FID and IS against a reference image folder or statistics. We use the JiT's pre-computed reference stats under `grn/utils_c2i/fid_stats`. --- ## 🎨 Text-to-Image ### Inference You can simply run `python3 t2i_infer.py` or use the following code: ```python from PIL import Image from grn_pipeline import GRNPipeline # Load pipeline pipeline = GRNPipeline.from_pretrained( hf_repo_id='bytedance-research/GRN', task='T2I', pn='1M', device='cpu', ).to('cuda') # Generate one image result = pipeline( prompt="A cute cat playing in the garden", guidance_scale=3.0, temperature=1.1, complexity_aware_Tmin=10, complexity_aware_Tmax=50, complexity_aware_k = 0, complexity_aware_b = 50, complexity_aware_wp = 5, snr_shift = 1., h_div_w=1., content_type='image', seed=42, ) image = result.images[0] image.save('./generated_image.jpg') ``` --- ## 🎬 Text-to-Video ### Inference You can simply run `python3 t2v_infer.py` or use the following code: ```python from grn_pipeline import GRNPipeline # Load pipeline pipeline = GRNPipeline.from_pretrained( hf_repo_id='bytedance-research/GRN', task='T2V', pn='0.41M', device='cpu' ).to('cuda') # Generate one video result = pipeline( prompt="Two women demonstrate a makeup product, applying it with a sponge while smiling and engaging with the camera in a bright, clean setting.", guidance_scale=4.0, temperature=1.0, complexity_aware_Tmin=10, complexity_aware_Tmax=50, complexity_aware_k = 0, complexity_aware_b = 50, complexity_aware_wp = 5, snr_shift = 1., h_div_w=9/16, duration=2., content_type='video', seed=42, ) video_file = result.videos[0] ``` --- ## 📦 HBQ Tokenizer ### Data Image Dataset, e.g., data_root/username/labels/imagenet/train.txt: ``` [image_1_full_path] [image_2_full_path] [image_3_full_path] ... ``` Video Dataset, e.g., data_root/username/labels_hanjian/high-quality-video/horizontal_videos.txt ``` [video_1_full_path] [video_2_full_path] [video_3_full_path] ... ``` ### Training For example, set `latent_channels=16/64` and `quant_method=hierarchical_binary_quant_round_4` in `scripts/hbq_tokenizer_train.sh`, then run: ```bash cd grn/tokenizer bash scripts/hbq_tokenizer_train.sh ``` ### Evaluation For example, set `latent_channels=16/64` and `quant_method=hierarchical_binary_quant_round_4` in `scripts/hbq_tokenizer_train.sh`, then run: ```bash cd grn/tokenizer bash scripts/hbq_tokenizer_eval.sh ``` --- ## 📧 Contact If you are interested in scaling GRN for image generation / image editing / video generation / video editing / unified model directions, please feel free to reach out! **📧 Email:** [hanjian.thu123@bytedance.com](mailto:hanjian.thu123@bytedance.com) --- ## 🤗 Acknowledgements - Thanks to [JiT](https://github.com/LTH14/JiT), [Infinity](https://github.com/FoundationVision/Infinity) and [InfinityStar](https://github.com/FoundationVision/InfinityStar) for their wonderful work and codebase! --- ## 📝 Citation If you find our work useful, please consider citing: ```bibtex @misc{han2026grn, title={Generative Refinement Networks for Visual Synthesis}, author={Jian Han and Jinlai Liu and Jiahuan Wang and Bingyue Peng and Zehuan Yuan}, year={2026}, eprint={2604.13030}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.13030}, } ```