# DriveLM
**Repository Path**: roshandaddy/DriveLM
## Basic Information
- **Project Name**: DriveLM
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: DriveLM-CARLA
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-11-01
- **Last Updated**: 2024-11-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
**DriveLM:** *Driving with **G**raph **V**isual **Q**uestion **A**nswering*
## Highlights
🔥 We present datasets (**DriveLM-Data**) built on nuScenes and CARLA, and propose a VLM-based baseline approach (**DriveLM-Agent**) for jointly performing Graph VQA and end-to-end driving.
## Table of Contents
1. [DriveLM-Data](#drivelmdata)
- [Comparison and Stats](#comparison)
- [GVQA Details](docs/gvqa.md)
- [Annotation and Features](docs/data_details.md)
2. [Dataset](#dataset)
3. [GVQA Generation](#gvqa_generation)
4. [Custom Dataset Generation & PDM-Lite](#custom_dataset_and_pdm_lite)
5. [Current Endeavors and Future Horizons](#timeline)
7. [License and Citation](#licenseandcitation)
8. [Other Resources](#otherresources)
## DriveLM-Data
We facilitate the `Perception, Prediction, Planning, Behavior, Motion` tasks with human-written reasoning logic as a connection between them. We propose the task of [GVQA](docs/gvqa.md) on the DriveLM-Data.
### 📊 Comparison and Stats
**DriveLM-Data** is the *first* language-driving dataset facilitating the full stack of driving tasks with graph-structured logical dependencies.
For more details, see [GVQA task](docs/gvqa.md), [Dataset Features](docs/data_details.md/#features), and [Annotation](docs/data_details.md/#annotation).
(back to top)
## Graph Visual Question Answering (GVQA) Dataset
We provide a GVQA dataset, featuring 71,223 keyframes out of 214,631 total frames across 1,759 routes with 100% completion and zero infractions. All scripts to generate the following VQA and keyframe files can be found [HERE](vqa_dataset).
1. Download the PDM-Lite dataset (330+ GB extracted).
**Note:** This dataset is based on the PDM-Lite expert with improvements integrated from ["Tackling CARLA Leaderboard 2.0 with
End-to-End Imitation Learning"](https://kashyap7x.github.io/assets/pdf/students/Zimmerlin2024.pdf)
```
bash download_pdm_lite_carla_lb2.sh
```
2. Get DriveLM-VGQA labels and keyframes:
```
wget https://huggingface.co/datasets/OpenDriveLab/DriveLM/resolve/main/drivelm_carla_keyframes.txt
wget https://huggingface.co/datasets/OpenDriveLab/DriveLM/resolve/main/drivelm_carla_vqas.zip
unzip drivelm_carla_vqas.zip
```
(back to top)
## GVQA Generation (Optional)
Extract keyframes:
```
python3 extract_keyframes.py --path-dataset /path/to/data --path-keyframes /path/to/save/keyframes.txt
```
Generate Graph-VQAs:
```
python3 carla_vqa_generator_main.py --path-keyframes /path/to/keyframes.txt --data-directory /path/to/data --output-graph-directory /path/to/output
```
Optional arguments:
- ```--sample-frame-mode```: Specify how to select frames, choose from 'all', 'keyframes', or 'uniform'.
- ```--sample-uniform-interval```: Specify the interval for uniform sampling.
- ```--save-examples```: Save example images for debugging.
- ```--visualize-projection```: Visualize object centers in images.
(back to top)
## Custom Dataset Generation & PDM-Lite
For instructions on generating your own dataset with CARLA Leaderboard 2.0 and the PDM-Lite implementation, see [HERE](pdm_lite)
(back to top)
## Current Endeavors and Future Directions
> - The advent of GPT-style multimodal models in real-world applications motivates the study of the role of language in driving.
> - Date below reflects the arXiv submission date.
> - If there is any missing work, please reach out to us!
DriveLM attempts to address some of the challenges faced by the community.
- **Lack of data**: DriveLM-Data serves as a comprehensive benchmark for driving with language.
- **Embodiment**: GVQA provides a potential direction for embodied applications of LLMs / VLMs.
- **Closed-loop**: DriveLM-CARLA attempts to explore closed-loop planning with language.
(back to top)
## License and Citation
All assets and code in this repository are under the [Apache 2.0 license](./LICENSE) unless specified otherwise. The language data is under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Other datasets (including nuScenes) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.
```BibTeX
@article{sima2023drivelm,
title={DriveLM: Driving with Graph Visual Question Answering},
author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
journal={arXiv preprint arXiv:2312.14150},
year={2023}
}
```
```BibTeX
@misc{contributors2023drivelmrepo,
title={DriveLM: Driving with Graph Visual Question Answering},
author={DriveLM contributors},
howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
year={2023}
}
```
(back to top)
## Other Resources
**OpenDriveLab**
- [DriveAGI](https://github.com/OpenDriveLab/DriveAGI) | [UniAD](https://github.com/OpenDriveLab/UniAD) | [OpenLane-V2](https://github.com/OpenDriveLab/OpenLane-V2) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)
- [Survey on BEV Perception](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) | [BEVFormer](https://github.com/fundamentalvision/BEVFormer) | [OccNet](https://github.com/OpenDriveLab/OccNet)
**Autonomous Vision Group**
- [tuPlan garage](https://github.com/autonomousvision/tuplan_garage) | [CARLA garage](https://github.com/autonomousvision/carla_garage) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)
- [PlanT](https://github.com/autonomousvision/plant) | [KING](https://github.com/autonomousvision/king) | [TransFuser](https://github.com/autonomousvision/transfuser) | [NEAT](https://github.com/autonomousvision/neat)
(back to top)