Starting with CUDA 10.0, a new set of metric APIs are added for devices with compute capability 7.0 and higher. These APIs provide low and deterministic profiling overhead on the target system. These are supported on all CUDA supported platforms except Android, and are not supported under MPS (Multi-Process Service), Confidential Compute, or SLI configured systems. In order to determine whether a device is compatible with this API, a new function cuptiProfilerDeviceSupported is introduced in CUDA 11.5 which exposes overall Profiling API support and specific requirements for a given device. Profiling API must be initialized by calling cuptiProfilerInitialize before testing device support.
- Enumeration (Host)
- Configuration (Host)
- Collection (Target)
- Evaluation (Host)
The list of metrics has been overhauled from earlier generation metrics and event APIs, to support a standard naming convention based upon unit__(subunit?)_(pipestage?)_quantity_qualifiers