# gtad **Repository Path**: yanbinwang/gtad ## Basic Information - **Project Name**: gtad - **Description**: https://github.com/frostinassiky/gtad - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-22 - **Last Updated**: 2021-01-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # G-TAD [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/g-tad-sub-graph-localization-for-temporal/temporal-action-localization-on-thumos14)](https://paperswithcode.com/sota/temporal-action-localization-on-thumos14?p=g-tad-sub-graph-localization-for-temporal) This repo holds the codes of paper: "[G-TAD: Sub-Graph Localization for Temporal Action Detection](https://arxiv.org/pdf/1911.11462.pdf)", accepted in CVPR 2020. ![G-TAD Overview](./gtad_overview.png) ## Update 15 Dec 2020: **The configuration for HACS Segment dataset is in the `hacs` branch.** With the [officail I3D pretrained features](http://hacs.csail.mit.edu/challenge.html), G-TAD can reach 27.481 Average mAP without tuning the model architecture. 24 Nov 2020: to celebrate my 2nd anniversary with Sally, I released the code for ActivityNet. :P Please checkout the branch `anet` to see the details. Feature: [GooogleDrive](https://drive.google.com/folderview?id=1ilLgmZYHG1rx0ADuzAkW8jeqdEDeZ19g), md5sum: `0ce54748883c4ce1cf6600f5ad04421b`. 30 Mar 2020: THUMOS14 feature is available! [GooogleDrive](https://drive.google.com/drive/folders/10PGPMJ9JaTZ18uakPgl58nu7yuKo8M_k?usp=sharing), [OneDrive](https://kaust-my.sharepoint.com/:f:/g/personal/xum_kaust_edu_sa/EgTwwUGf0O1Kug_A6ym-y_8BlEJ04_xPME9EFbAAKRPQNw?e=AVgHlW) 15 Apr 2020: THUMOS14 code is published! I update the post processing code so the experimental result is **slightly better** than the orignal paper! 29 Apr 2020: We updated our code based on @Phoenix1327's comment. The experimental result is **slightly better**. Please see details in this [issue](https://github.com/Frostinassiky/gtad/issues/4). ## Overview Temporal action detection is a fundamental yet challenging task in video understanding. Video context is a critical cue to effectively detect actions, but current works mainly focus on temporal context, while neglecting semantic context as well as other important context properties. In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem. Specifically, we formulate video snippets as graph nodes, snippet-snippet correlations as edges, and actions associated with context as target sub-graphs. With graph convolution as the basic operation, we design a GCN block called GCNeXt, which learns the features of each node by aggregating its context and dynamically updates the edges in the graph. To localize each sub-graph, we also design a SGAlign layer to embed each sub-graph into the Euclidean space. Extensive experiments show that G-TAD is capable of finding effective video context without extra supervision and achieves state-of-the-art performance on two detection benchmarks. On ActityNet-1.3, we obtain an average mAP of 34.09%; on THUMOS14, we obtain 40.16% in mAP@0.5, beating all the other one-stage methods. [Detail](https://sites.google.com/kaust.edu.sa/g-tad), [Video](https://www.youtube.com/watch?v=BlPxnDcykUo), [Arxiv](https://arxiv.org/abs/1911.11462). ## Dependencies * Python == 3.7 * Pytorch==1.1.0 or 1.3.0 * CUDA==10.0.130 * CUDNN==7.5.1_0 * GCC >= 4.9 ## Installation Based on the idea of ROI Alignment from Mask-RCNN, we devoloped **SGAlign layer** in our implementation. You have to compile a short cuda code to run Algorithm 1 in our [paper](https://arxiv.org/abs/1911.11462). 1. Create conda environment ```shell script conda env create -f env.yml source activate gtad ``` 2. Install `Align1D2.2.0` ```shell script cd gtad_lib python setup.py install ``` 3. Test `Align1D2.2.0` ```shell script python align.py ``` ### Data setup To reproduce the results in THUMOS14 without further changes: 1. Download the data from [GooogleDrive](https://drive.google.com/drive/folders/10PGPMJ9JaTZ18uakPgl58nu7yuKo8M_k?usp=sharing) or [OneDrive](https://kaust-my.sharepoint.com/:f:/g/personal/xum_kaust_edu_sa/EgTwwUGf0O1Kug_A6ym-y_8BlEJ04_xPME9EFbAAKRPQNw?e=AVgHlW). 2. Place it into a folder named `TSN_pretrain_avepool_allfrms_hdf5` inside `data/thumos_feature`. > You could also pass the folder containing the HDF5 files if the script admits the following argument `--feature_path`. ## Code Architecture gtad # this repo ├── data # feature and label ├── evaluation # evaluation code from offical API ├── gtad_lib # gtad library └── ... ## Train and evaluation After downloading the dataset and setting up the envirionment, you can start from the following script. ```shell script python gtad_train.py python gtad_inference.py python gtad_postprocessing.py ``` or ```shell script bash gtad_thumos.sh | tee log.txt ``` If everything goes well, you can get the following result: ``` mAP at tIoU 0.3 is 0.5731204387052588 mAP at tIoU 0.4 is 0.5129888769308306 mAP at tIoU 0.5 is 0.43043083034478025 mAP at tIoU 0.6 is 0.32653130678508374 mAP at tIoU 0.7 is 0.22806267480976325 ``` ## Bibtex CVPR Version. ```text @InProceedings{xu2020gtad, author = {Xu, Mengmeng and Zhao, Chen and Rojas, David S. and Thabet, Ali and Ghanem, Bernard}, title = {G-TAD: Sub-Graph Localization for Temporal Action Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} } ``` ## Reference Those are very helpful and promising implementations for the temporal action localization task. My implementations borrow ideas from them. - BSN: Boundary Sensitive Network for Temporal Action Proposal Generation. [Paper](https://arxiv.org/abs/1806.02964) [Code](https://github.com/wzmsltw/BSN-boundary-sensitive-network) - BMN: BMN: Boundary-Matching Network for Temporal Action Proposal Generation. [Paper](https://arxiv.org/abs/1907.09702) [Code - PaddlePaddle](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/video/models/bmn) [Code PyTorch](https://github.com/JJBOY/BMN-Boundary-Matching-Network) - Graph Convolutional Networks for Temporal Action Localization. [Paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Zeng_Graph_Convolutional_Networks_for_Temporal_Action_Localization_ICCV_2019_paper.pdf) [Code](https://github.com/Alvin-Zeng/PGCN) ## Contact mengmeng.xu[at]kaust.edu.sa