# C3D-tensorflow **Repository Path**: arnoldfychen/C3D-tensorflow ## Basic Information - **Project Name**: C3D-tensorflow - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-05-17 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # C3D-tensorflow This is a repository trying to implement [C3D-caffe][5] on tensorflow,useing models directly converted from original C3D-caffe. Be aware that there are about 5% video-level accuracy margin on UCF101 split1 between our implement in tensorflow and the original C3D-caffe. ## Requirements: 1. Have installed the tensorflow >= 1.2 version 2. You must have installed the following two python libs: a) [tensorflow][1] b) [Pillow][2] 3. You must have downloaded the [UCF101][3] (Action Recognition Data Set) 4. Each single avi file is decoded with 5FPS (it's depend your decision) in a single directory. - you can use the `./list/convert_video_to_images.sh` script to decode the ucf101 video files - run `./list/convert_video_to_images.sh .../UCF101 5` 5. Generate {train,test}.list files in `list` directory. Each line corresponds to "image directory" and a class (zero-based). For example: - you can use the `./list/convert_images_to_list.sh` script to generate the {train,test}.list for the dataset - run `./list/convert_images_to_list.sh .../dataset_images 4`, this will generate `test.list` and `train.list` files by a factor 4 inside the root folder ``` database/ucf101/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01 0 database/ucf101/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02 0 database/ucf101/train/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03 0 database/ucf101/train/ApplyLipstick/v_ApplyLipstick_g01_c01 1 database/ucf101/train/ApplyLipstick/v_ApplyLipstick_g01_c02 1 database/ucf101/train/ApplyLipstick/v_ApplyLipstick_g01_c03 1 database/ucf101/train/Archery/v_Archery_g01_c01 2 database/ucf101/train/Archery/v_Archery_g01_c02 2 database/ucf101/train/Archery/v_Archery_g01_c03 2 database/ucf101/train/Archery/v_Archery_g01_c04 2 database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c01 3 database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c02 3 database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c03 3 database/ucf101/train/BabyCrawling/v_BabyCrawling_g01_c04 3 database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c01 4 database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c02 4 database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c03 4 database/ucf101/train/BalanceBeam/v_BalanceBeam_g01_c04 4 ... ``` ## Usage 1. `python train_c3d_ucf101.py` will train C3D model. The trained model will saved in `models` directory. 2. `python predict_c3d_ucf101.py` will test C3D model on a validation data set. 3. `cd ./C3D-tensorflow-1.0 &&python Random_clip_valid.py` will get the random-clip accuracy on UCF101 test set with provided `sports1m_finetuning_ucf101.model`. 4. `C3D-tensorflow-1.0/Random_clip_valid.py` code is compatible with tensorflow 1.0+ , with a little bit different with the old repository 5. IMPORTANT NOTE: when you load the sports1m_finetuning_ucf101.model,you should use the tranpose operation like:` pool5 = tf.transpose(pool5, perm=[0,1,4,2,3])`,or in `Random_clip_valid.py` looks like:`["transpose", [0, 1, 4, 2, 3]]`, but if you load `conv3d_deepnetA_sport1m_iter_1900000_TF.model` or `c3d_ucf101_finetune_whole_iter_20000_TF.model`,you don't need tranpose operation,just comment that line code. ## Experiment result: - Note: 1.All report results are done specific on UCF101 split1 (train videos:9537,test videos:3783). 2.ALL the results are video-level accuracy,unless stated otherwise. 3.We follow the same way to extract clips from video as the C3D paper saying:'To extract C3D feature, a video is split into 16 frame long clips with a 8-frame overlap between two consecutive clips.These clips are passed to the C3D network to extract fc6 activations. These clip fc6 activations are averaged to form a 4096-dim video descriptor which is then followed by an L2-normalization' - C3D as feature extractor: | platform | feature extractor model | fc6+SVM | fc6+SVM+L2 norm | |:-----------:|:---------------:|:----------:|:----------------:| | caffe | conv3d_deepnetA_sport1m_iter_1900000.caffemodel| 81.99% | 83.39% | | tensorflow | conv3d_deepnetA_sport1m_iter_1900000_TF.model | 79.38% | 81.44% | | tensorflow | c3d_ucf101_finetune_whole_iter_20000_TF.model | 79.67% | 81.33% | | tensorflow | sports1m_finetuning_ucf101.model | 82.73% | 85.35% | - finetune C3D network on UCF101 split1 with pre-trained model: | platform | pre-trained model | train-strategy | video-accuracy | clip-accuracy | random-clip | |:-----------:|:---------------:|:----------------:|:----------------:|:----------------:|:----------------:| | caffe | c3d_ucf101_finetune_whole_iter_20000.caffemodel| directly test | - | 79.87% | - | | tensorflow | c3d_ucf101_finetune_whole_iter_20000_TF.model | directly test | 78.35% | 72.77% | 57.15% | | tensorflow-A | conv3d_deepnetA_sport1m_iter_1900000_TF.caffemodel | whole finetuning | 76.0% | 71% | 69.8% | | tensorflow-B | sports1m_finetuning_ucf101.model | freeze conv,only finetune fc layers | 79.93% | 74.65% | 76.6% | - Note: 1.the `tensorflow-A` model corresponding to the original C3D model pre-trained on UCF101 provided by @ [hx173149][7] . 2.the `tensorflow-B` model is just freeze the conv layers in `tensorflow-A` and finetuning four more epochs on fc layers with `learning rate=1e-3`. 3.the `random-clip` column means random choose one clip from each video in UCF101 test split 1 ,so the result are not so robust.But according to the Law of Large Numbers,we may assume this items is positive correlated to your video-level accuracy. 4.it's obvious that it if you do more finetuning work based on `c3d_ucf101_finetune_whole_iter_20000_TF.model`,and you may achieve better performance,i didn't do it because of time limit. 5.with no doubt that you can get better result by appropriately finetuning the network ## Trained models: | Model | Description | Clouds | Download | | ------------------- | ----------------- | -------- | ------------| | C3D sports1M TF |C3D sports1M converted from caffe C3D| Dropbox |[C3D sports1M ](https://www.dropbox.com/s/zvco2rfufryivqb/conv3d_deepnetA_sport1m_iter_1900000_TF.model?dl=0) | | C3D UCF101 TF |C3D UCF101 trained model converted from caffe C3D| Dropbox |[C3D UCF101 ](https://www.dropbox.com/s/u5fxqzks2pkaolx/c3d_ucf101_finetune_whole_iter_20000_TF.model?dl=0 ) | | C3D UCF101 TF train |finetuning on UCF101 split1 use C3D sports1M model by @ [hx173149][7]| Dropbox |[C3D UCF101 split1](https://www.dropbox.com/sh/8wcjrcadx4r31ux/AAAkz3dQ706pPO8ZavrztRCca?dl=0) | | split1 meanfile TF | UCF101 split1 meanfile converted from caffe C3D | Dropbox |[UCF101 split1 meanfile](https://www.dropbox.com/sh/8wcjrcadx4r31ux/AAAkz3dQ706pPO8ZavrztRCca?dl=0) | | everything above | all four files above | baiduyun |[baiduyun](http://pan.baidu.com/s/1nuJe8vn) | ## Other verions - [C3D-estimator-sagemaker](https://github.com/frankgu/C3D-estimator-sagemaker): Use sagemaker to train C3D model. ## References: - Thanks the author [Du tran][4]'s code: [C3D-caffe][5] - [C3D: Generic Features for Video Analysis][6] [1]: https://www.tensorflow.org/ [2]: http://pillow.readthedocs.io/en/3.1.x/reference/Image.html [3]: http://crcv.ucf.edu/data/UCF101.php [4]: https://github.com/dutran [5]: https://github.com/facebook/C3D [6]: http://vlg.cs.dartmouth.edu/c3d/ [7]:https://github.com/hx173149/C3D-tensorflow