# Workflow The core of the PASTA framework—underlying AccelProf—is a modular, stage-based workflow designed for scalable and extensible program analysis. It consists of three major components: ``` Application → [Event Handler] → [Event Processor] → [Tool Collection] ``` Each component plays a distinct role in the profiling pipeline, enabling clean separation of concerns and easy extensibility. --- ## Components ### 🔹 Application The target workload can be: - A deep learning model executed via a framework such as **PyTorch** - A binary GPU-accelerated application using libraries like **cuDNN** This application is the source of events captured during runtime. --- ### 🔹 Event Handler Responsible for intercepting execution events. It integrates: - **Low-level profiling APIs** e.g., NVIDIA Compute Sanitizer, AMD ROCm ROCProfiler - **High-level callbacks** e.g., from DL frameworks like PyTorch This component abstracts vendor-specific and API-level complexity through a unified internal interface, offering consistent behavior across platforms. --- ### 🔹 Event Processor The pre-processing stage that prepares collected data for analysis: - Can be executed on **CPU** or **GPU** - Transforms raw events into enriched, normalized structures - Performs filtering, bucketing, or timing-based correlation The processor routes processed data to registered tools for final analysis. --- ### 🔹 Tool Collection This is the analysis backend of PASTA: - Hosts **user-defined profiling tools** - Operates on preprocessed events - Generates profiling results such as: - Kernel launch frequency - Memory access hotness maps - Tensor allocation patterns - Operator-level performance summaries Each tool is modular, and developers can add new tools by inheriting from a simple interface and registering their implementation. --- This architecture enables **flexibility**, **cross-vendor support**, and **fine-grained customization**—making PASTA a powerful backend for performance engineering on modern heterogeneous systems.