AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
PyTorch Extension Library of Optimized Scatter Operations
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.