Workflow
The core of the PASTA framework—underlying AccelProf—is a modular, stage-based workflow designed for scalable and extensible program analysis. It consists of three major components:
Application → [Event Handler] → [Event Processor] → [Tool Collection]
Each component plays a distinct role in the profiling pipeline, enabling clean separation of concerns and easy extensibility.
Components
🔹 Application
The target workload can be:
A deep learning model executed via a framework such as PyTorch
A binary GPU-accelerated application using libraries like cuDNN
This application is the source of events captured during runtime.
🔹 Event Handler
Responsible for intercepting execution events. It integrates:
Low-level profiling APIs
e.g., NVIDIA Compute Sanitizer, AMD ROCm ROCProfilerHigh-level callbacks
e.g., from DL frameworks like PyTorch
This component abstracts vendor-specific and API-level complexity through a unified internal interface, offering consistent behavior across platforms.
🔹 Event Processor
The pre-processing stage that prepares collected data for analysis:
Can be executed on CPU or GPU
Transforms raw events into enriched, normalized structures
Performs filtering, bucketing, or timing-based correlation
The processor routes processed data to registered tools for final analysis.
🔹 Tool Collection
This is the analysis backend of PASTA:
Hosts user-defined profiling tools
Operates on preprocessed events
Generates profiling results such as:
Kernel launch frequency
Memory access hotness maps
Tensor allocation patterns
Operator-level performance summaries
Each tool is modular, and developers can add new tools by inheriting from a simple interface and registering their implementation.
This architecture enables flexibility, cross-vendor support, and fine-grained customization—making PASTA a powerful backend for performance engineering on modern heterogeneous systems.