# HolisticTraceAnalysis **Repository Path**: wanglei07/HolisticTraceAnalysis ## Basic Information - **Project Name**: HolisticTraceAnalysis - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: 0125_fix_nccl_kernel - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-05 - **Last Updated**: 2025-03-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![CircleCI](https://circleci.com/gh/facebookresearch/HolisticTraceAnalysis.svg?style=shield)](https://app.circleci.com/pipelines/github/facebookresearch/HolisticTraceAnalysis) [![codecov](https://codecov.io/github/facebookresearch/holistictraceanalysis/branch/main/graph/badge.svg?token=R44P6M3RJN)](https://codecov.io/github/facebookresearch/holistictraceanalysis) [![Docs](https://readthedocs.org/projects/hta/badge/?version=latest)](https://hta.readthedocs.io/en/latest/?badge=latest) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/LICENSE) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/CONTRIBUTING.md) # Holistic Trace Analysis Holistic Trace Analysis (HTA), is a performance analysis tool to identify performance bottlenecks in distributed training workloads. HTA achieves this by analyzing traces collected through the [PyTorch Profiler](https://github.com/pytorch/kineto) a.k.a. Kineto. ## Features HTA provides the following features: 1. __Temporal Breakdown__ - Breakdown of time taken by the GPUs in terms of time spent in computation, communication, memory events, and idle time across all ranks. 1. __Kernel Breakdown__ - Finds kernels with the longest duration on each rank. 1. __Kernel Duration Distribution__ - Distribution of average time taken by longest kernels across different ranks. 1. __Idle Time Breakdown__ - Breakdown of GPU idle time into waiting for the host, waiting for another kernel or attribution to an unknown cause. 1. __Communication Computation Overlap__ - Calculate the percentage of time when communication overlaps computation. 1. __Frequent CUDA Kernel Patterns__ - Find the CUDA kernels most frequently launched by any given PyTorch or user defined operator. 1. __CUDA Kernel Launch Statistics__ - Distributions of GPU kernels with very small duration, large duration, and excessive launch time. 1. __Augmented Counters (Queue length, Memory bandwidth)__ - Augmented trace files which provide insights into memory bandwidth utilized and number of outstanding operations on each CUDA stream. 1. __Trace Comparison__ - A trace comparison tool to identify and visualize the differences between traces. 1. __CUPTI Counter Analysis__ - An experimental API to get GPU performance counters. By attributing performance measurements from kernels to PyTorch operators roofline analysis can be performed and kernels can be optimized. ## Installation HTA runs on Linux and Mac with Python >= 3.8. ### Setup a Conda environment (optional) See [here](https://docs.conda.io/en/latest/miniconda.html) to install Miniconda. Create the environment `env_name` ``` bash conda create -n env_name ``` Activate the environment ``` bash conda activate env_name ``` Deactivate the environment ``` bash conda deactivate ``` ### Install using PyPI (stable) ``` pip install HolisticTraceAnalysis ``` ### Install from source ``` git clone https://github.com/facebookresearch/HolisticTraceAnalysis.git cd HolisticTraceAnalysis git submodule update --init pip install -r requirements.txt pip install -e . ``` ## Documentation Learn more about the features and the API from our [documentation](https://hta.readthedocs.io/en/latest/index.html). ## Usage ### Data Preparation All traces collected from a job must reside in a unique folder. ### Analysis in a Jupyter notebook Activate the Conda environment and launch a Jupyter notebook. ``` conda activate env_name jupyter notebook ``` Import HTA, and create a `TraceAnalysis` object ``` python from hta.trace_analysis import TraceAnalysis analyzer = TraceAnalysis(trace_dir = "/path/to/folder/containing/the/traces") ``` #### Basic Usage ``` python # Temporal breakdown temporal_breakdown_df = analyzer.get_temporal_breakdown() # Kernel breakdown kernel_breakdown_df = analyzer.get_gpu_kernel_breakdown() # Idle time breakdown idle_time_df = analyzer.get_idle_time_breakdown() # Communication computation overlap comm_comp_overlap_df = analyzer.get_comm_comp_overlap() # Frequent CUDA kernel patterns frequent_patterns_df = analyzer.get_frequent_cuda_kernel_patterns(operator_name="aten::linear", output_dir="/new/trace/path") # CUDA kernel launch statistics cuda_launch_kernel_stats = analyzer.get_cuda_kernel_launch_stats() # Memory bandwidth time series memory_bw_series = analyzer.get_memory_bw_time_series() # Memory bandwidth summary memory_bw_summary = analyzer.get_memory_bw_summary() # Queue length time series ql_series = analyzer.get_queue_length_time_series() # Queue length summary ql_summary = analyzer.get_queue_length_summary() ``` For a detailed demo run the `trace_analysis_demo` and `trace_diff_demo` notebooks in the examples folder. #### Advanced Usage __Logging Level__ Logging level is set through a configuration file in HTA. The default logging level is set in `hta/configs/logging.config` and can be changed in the `[logger_hta]` section of the file. If needed, a different logging file can be configured to use by modifying `hta/configs/trace_analyzer.json`. #### Repo Map ``` ├── examples # folder containing demo notebooks │  ├── ... ├── hta │  ├── analyzers # core logic for each analysis │  │  ├── ... │  ├── common # code common to multiple analysis │  │  ├── ... │  ├── configs # config files │  │  ├── ... │  ├── trace_analysis.py # entrypoint for TraceAnalysis API │  ├── trace_diff.py # entrypoint for TraceDiff API │  └── utils # utility files │  └── ... ├── scripts # generic tools for traces │  └── ... │── tests # unittests │  └── ... ``` ## Contributing We welcome new contributions. If you plan to contribute new features or extensions, please first open an [issue](https://github.com/facebookresearch/HolisticTraceAnalysis/issues) and discuss the feature with us. To learn more about how to contribute, see our [contributing guidelines](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/CONTRIBUTING.md). Please let us know if you encounter a bug by filing an [issue](https://github.com/facebookresearch/HolisticTraceAnalysis/issues). ## The Team HTA is currently maintained by: [Anupam Bhatnagar](https://github.com/anupambhatnagar), [Brian Coutinho](https://github.com/briancoutinho), [Xizhou Feng](https://github.com/fengxizhou), [Yifan Liu](https://github.com/yifanliu112), [Sung-Han Lin](https://github.com/sunghlin) and [Louis Feng](https://github.com/louisfeng). Past contributors include [Michael Acar](https://github.com/mjacar) and [Yuzhen Huang](https://github.com/Yuzhen11). ## License Holistic Trace Analysis is licensed under the [MIT License](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/LICENSE).