# monodepth **Repository Path**: cedaroski/monodepth ## Basic Information - **Project Name**: monodepth - **Description**: Unsupervised single image depth prediction with CNNs - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-08 - **Last Updated**: 2026-04-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # monodepth Tensorflow implementation of unsupervised single image depth prediction using a convolutional neural network.

monodepth

**Unsupervised Monocular Depth Estimation with Left-Right Consistency** [Clément Godard](http://www0.cs.ucl.ac.uk/staff/C.Godard/), [Oisin Mac Aodha](http://vision.caltech.edu/~macaodha/) and [Gabriel J. Brostow](http://www0.cs.ucl.ac.uk/staff/g.brostow/) CVPR 2017 For more details: [project page](http://visual.cs.ucl.ac.uk/pubs/monoDepth/) [arXiv](https://arxiv.org/abs/1609.03677) ## Requirements This code was tested with Tensorflow 1.0, CUDA 8.0 and Ubuntu 16.04. Training takes about 30 hours with the default parameters on the **kitti** split on a single Titan X machine. You can train on multiple GPUs by setting them with the `--num_gpus` flag, make sure your `batch_size` is divisible by `num_gpus`. ## I just want to try it on an image! There is a simple mode `monodepth_simple.py` which allows you to quickly run our model on a test image. Make sure your first [download one of the pretrained models](#models) in this example we will use `model_cityscapes`. ```shell python monodepth_simple.py --image_path ~/my_image.jpg --checkpoint_path ~/models/model_cityscapes ``` **Please note that there is NO extension after the checkpoint name** ## Data This model requires rectified stereo pairs for training. There are two main datasets available: ### [KITTI](http://www.cvlibs.net/datasets/kitti/raw_data.php) We used two different split of the data, **kitti** and **eigen**, amounting for respectively 29000 and 22600 training samples, you can find them in the [filenames](utils/filenames) folder. You can download the entire raw dataset by running: ```shell wget -i utils/kitti_archives_to_download.txt -P ~/my/output/folder/ ``` **Warning:** it weights about **175GB**, make sure you have enough space to unzip too! To save space you can convert the png images to jpeg. ```shell find ~/my/output/folder/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}' ``` ### [Cityscapes](https://www.cityscapes-dataset.com) You will need to register in order to download the data, which already has a train/val/test set with 22973 training images. We used `leftImg8bit_trainvaltest.zip`, `rightImg8bit_trainvaltest.zip`, `leftImg8bit_trainextra.zip` and `rightImg8bit_trainextra.zip` which weights **110GB**. ## Training **Warning:** The input sizes need to be mutiples of 128 for `vgg` or 64 for `resnet50` . The model's dataloader expects a data folder path as well as a list of filenames (relative to the root data folder): ```shell python monodepth_main.py --mode train --model_name my_model --data_path ~/data/KITTI/ \ --filenames_file ~/code/monodepth/utils/filenames/kitti_train_files.txt --log_directory ~/tmp/ ``` You can continue training by loading the last saved checkpoint using `--checkpoint_path` and pointing to it: ```shell python monodepth_main.py --mode train --model_name my_model --data_path ~/data/KITTI/ \ --filenames_file ~/code/monodepth/utils/filenames/kitti_train_files.txt --log_directory ~/tmp/ \ --checkpoint_path ~/tmp/my_model/model-50000 ``` You can also fine-tune from a checkpoint using `--retrain`. You can monitor the learning process using `tensorboard` and pointing it to your chosen `log_directory`. By default the model only saves a reduced summary to save disk space, you can disable this using `--full_summary`. Please look at the [main file](monodepth_main.py) for all the available options. ## Testing To test change the `--mode` flag to `test`, the network will output the disparities in the model folder or in any other folder you specify wiht `--output_directory`. You will also need to load the checkpoint you want to test on, this can be done with `--checkpoint_path`: ```shell python monodepth_main.py --mode test --data_path ~/data/KITTI/ \ --filenames_file ~/code/monodepth/utils/filenames/kitti_stereo_2015_test_files.txt --log_directory ~/tmp/ \ --checkpoint_path ~/tmp/my_model/model-181250 ``` **Please note that there is NO extension after the checkpoint name** If your test filenames contain two files per line the model will ignore the second one, unless you use the `--do_stereo` flag. The network will output two files `disparities.npy` and `disparities_pp.npy`, respecively for raw and post-processed disparities. ## Evaluation on KITTI To evaluate run: ```shell python utils/evaluate_kitti.py --split kitti --predicted_disp_path ~/tmp/my_model/disparities.npy \ --gt_path ~/data/KITTI/ ``` The `--split` flag allows you to choose which dataset you want to test on. * `kitti` corresponds to the 200 official training set pairs from [KITTI stereo 2015](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo). * `eigen` corresponds to the 697 test images used by [Eigen NIPS14](http://www.cs.nyu.edu/~deigen/depth/) and uses the raw LIDAR points. **Warning**: The results on the Eigen split are usually cropped, which you can do by passing the `--garg_crop` flag. ## Models You can download our pre-trained models to an existing directory by running: ```shell sh ./utils/get_model.sh model_name output_directory ``` All our models were trained for 50 epochs, 512x256 resolution and a batch size of 8, please see our paper for more details. We converted KITTI and Cityscapes to jpeg before training. Here are all the models available: * `model_kitti`: Our main model trained on the **kitti** split * `model_eigen`: Our main model trained on the **eigen** split * `model_cityscapes`: Our main model trained on **cityscapes** * `model_city2kitti`: `model_cityscapes` fine-tuned on **kitti** * `model_city2eigen`: `model_cityscapes` fine-tuned on **eigen** * `model_kitti_stereo`: Our stereo model trained on the **kitti** split for 12 epochs, make sure to use `--do_stereo` when using it All our models, except for stereo, have a Resnet50 variant which you can get by adding `_resnet` to the model name. To test or train using these variants, you need to use the flag `--encoder resnet50`. ## Results You can download our results (unscaled disparities at 512x256) on both KITTI splits (**kitti** and **eigen**) [here](http://visual.cs.ucl.ac.uk/pubs/monoDepth/results/). The naming convention is the same as with the models. ## Reference If you find our work useful in your research please consider citing our paper: ``` @inproceedings{monodepth17, title = {Unsupervised Monocular Depth Estimation with Left-Right Consistency}, author = {Cl{\'{e}}ment Godard and Oisin {Mac Aodha} and Gabriel J. Brostow}, booktitle = {CVPR}, year = {2017} } ``` ## Video [![Screenshot](https://img.youtube.com/vi/go3H2gU-Zck/0.jpg)](https://www.youtube.com/watch?v=go3H2gU-Zck) ## License Copyright © Niantic, Inc. 2018. Patent Pending. All rights reserved. This Software is licensed under the terms of the UCLB ACP-A Licence which allows for non-commercial use only, the full terms of which are made available in the [LICENSE](LICENSE) file. For any other use of the software not covered by the terms of this licence, please contact info@uclb.com