# DiffiT

**Repository Path**: HeJiaxing97/DiffiT

## Basic Information

- **Project Name**: DiffiT
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: dev
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-03-09
- **Last Updated**: 2024-03-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# DiffiT: Diffusion Vision Transformers for Image Generation

Official PyTorch implementation of [**DiffiT: Diffusion Vision Transformers for Image Generation**](https://arxiv.org/abs/2312.02139).


**Code and pretrained DiffiT models will be released soon !**  

[![Star on GitHub](https://img.shields.io/github/stars/NVlabs/DiffiT.svg?style=social)](https://github.com/NVlabs/DiffiT/stargazers)

**DiffiT** achieves a new SOTA FID score of **1.73** on **ImageNet-256 dataset** ! 

![teaser](./assets/imagenet.png)


In addition, **DiffiT** sets a new SOTA FID score of **2.22** on **FFHQ-64 dataset** ! 

![teaser](./assets/ffhq-diffit.png)

We introduce a new Time-dependent Multihead Self-Attention (TMSA) mechanism that jointly learns **spatial** and **temporal** dependencies and allows for **attention conditioning** with finegrained control. 


![teaser](./assets/latent_diffit.png)

## 💥 News 💥
- **[12.04.2023]** 🔥 DiffiT [manuscript](https://arxiv.org/abs/2312.02139) is now available on arXiv !


# Benchmarks 

## Latent Space


### ImageNet-256

| Model| Dataset |  Resolution | FID-50K | Inception Score |
|---------|----------|-----------|---------|--------|
|**Latent DiffiT** | ImageNet | 256x256   | **1.73**    | **276.49**|

### ImageNet-512

| Model| Dataset |  Resolution | FID-50K | Inception Score |
|---------|----------|-----------|---------|--------|
|**Latent DiffiT** | ImageNet | 512x512   | **2.67**    | **252.12**|


## Image Space


| Model| Dataset |  Resolution | FID-50K | 
|---------|----------|-----------|---------|
|**DiffiT** | CIFAR-10 | 32x32   | **1.95** |
|**DiffiT** | FFHQ-64 | 64x64   | **2.22** |

## Citation

```
@article{hatamizadeh2023diffit,
  title={Diffit: Diffusion vision transformers for image generation},
  author={Hatamizadeh, Ali and Song, Jiaming and Liu, Guilin and Kautz, Jan and Vahdat, Arash},
  journal={arXiv preprint arXiv:2312.02139},
  year={2023}
}
```

## Licenses

Copyright © 2024, NVIDIA Corporation. All rights reserved.