Usage Guide for AccelProf
AccelProf is a flexible command-line tool designed for profiling GPU applications. It supports a variety of tools to collect metrics, trace memory, and analyze deep learning workloads.
Basic Usage
AccelProf can be invoked directly from the terminal to analyze an application using pre-built or custom tools.
Command Syntax
accelprof -v -t app_analysis {executable} {executable arguments}
Example: Profiling vectoradd
Run the vectoradd example using the app_metric tool:
accelprof -v -t app_metric ./vectoradd
Output Files
Profiling Log:
vectoradd.accelprof.log
This file logs the profiling session metadata and runtime activity:
[ACCELPROF INFO] VERSION : c6d15f78b30385ba30ee64ab47db1b9b4729d16c, modified 0
[ACCELPROF INFO] LD_PRELOAD : /home/mao/AccelProf/lib/libcompute_sanitizer.so
[ACCELPROF INFO] OPTIONS : -v -t app_metric
[ACCELPROF INFO] COMMAND : ./vectoradd
[ACCELPROF INFO] START TIME : Fri May 30 04:47:08 PM PDT 2025
...
[SANITIZER INFO] Free memory 0x7fb4e9400000 with size 800000 (flag: 0)
Dumping traces to vectoradd_2025-05-30_16-47-09.log
[ACCELPROF INFO] END TIME : Fri May 30 04:47:09 PM PDT 2025
[ACCELPROF INFO] ELAPSED TIME : 00:00:01
Analysis Results:
vectoradd_xxxx.log
This file contains key memory and kernel statistics:
Alloc(0) 0: 140414982029312 800000 (781.25 KB)
Alloc(0) 1: 140414982829568 800000 (781.25 KB)
...
Maximum memory accesses kernel: vecAdd(double*, double*, double*, int) (Kernel ID: 0)
Maximum memory accesses per kernel: 300000 (300.00 K)
Average memory accesses per kernel: 300000 (300.00 K)
Total memory accesses: 600000 (600.00 K)
Average accesses per page: 1024
Advanced Usage
To explore all command-line options supported by AccelProf, run:
accelprof -h
Help Output
Description: A collection of CUDA application profilers.
Usage:
-h, --help
Print this help message.
-t <tool_name>
none: Do nothing.
mem_trace: Trace memory access of CUDA kernels.
app_metric: Collect metrics for CUDA Applications.
code_check: Check CUDA code for potential issues.
hot_analysis: Analyze hot memory regions.
app_analysis: Analyze CUDA application performance.
app_analysis_cpu: Analyze CUDA application performance on CPU.
time_hotness_cpu: Analyze time hotness on CPU.
uvm_advisor: Analyze UVM memory access patterns.
-d <device_name>
Specify the device vendor name.
Valid values:
- nvc (NVIDIA Compute Sanitizer, default)
- nvbit (NVIDIA NVBit)
- rocm (AMD)
-v
Verbose mode.
Use Different Tools
AccelProf supports multiple profiling modes:
Metrics Collection
accelprof -v -t app_metric ./vectoradd
Hot Memory Region Analysis
accelprof -v -t hot_analysis ./vectoradd
Application Performance Analysis
accelprof -v -t app_analysis ./vectoradd
Use Different Backends/Vendors
AccelProf can use various vendor APIs as profiling backends:
NVIDIA Compute Sanitizer (Default)
accelprof -v -t app_analysis ./vectoradd
NVIDIA NVBit
accelprof -v -d nvbit -t app_analysis ./vectoradd
AMD ROCProfiler
accelprof -v -d rocm -t app_analysis ./vectoradd
Customized Range Inspection
AccelProf allows fine-grained instrumentation by letting users specify code regions to analyze using lightweight Python APIs.
Example: test.py
import accelprof
accelprof.start()
# Insert target analysis code here
accelprof.end()
Use this pattern to isolate specific function calls, training loops, or inference paths for analysis.
This flexible usage model enables both high-level and low-level insight into application behavior across platforms.