ptyrad.runtime.diagnostics#

System and environment diagnostic reporting.

This module provides utilities to query and log the current hardware (CPU, Memory, GPU) and software (OS, Python, dependencies) environment. It includes specific support for detecting SLURM cluster allocations and identifying NVIDIA Multi-Instance GPU (MIG) configurations.

Functions

is_mig_enabled()

Detects if any NVIDIA GPU on the system is operating in MIG mode.

print_gpu_info()

Logs physical GPU hardware and CUDA details.

print_packages_info()

Logs installed versions of critical Python dependencies.

print_system_info()

Logs comprehensive system hardware and operating system information.

ptyrad.runtime.diagnostics.is_mig_enabled()[source]#

Detects if any NVIDIA GPU on the system is operating in MIG mode.

Multi-Instance GPU (MIG) allows a physical GPU to be securely partitioned into multiple separate GPU instances. This function queries nvidia-smi to check if this hardware partitioning is currently active, which is important because certain multi-GPU communication backends (like NCCL) do not fully support MIG slices.

Returns:

True if MIG mode is enabled on any detected GPU, False if it is disabled, or if the detection fails (e.g., nvidia-smi not found).

Return type:

bool

ptyrad.runtime.diagnostics.print_system_info()[source]#

Logs comprehensive system hardware and operating system information.

This function records the OS platform, processor architecture, available CPU cores, and system memory. It automatically detects if the code is running inside a SLURM job allocation and reports the SLURM-restricted resources instead of the total physical node resources. It subsequently triggers GPU and package diagnostics.

ptyrad.runtime.diagnostics.print_gpu_info()[source]#

Logs physical GPU hardware and CUDA details.

Detects and reports available compute backends, including NVIDIA CUDA and Apple Silicon MPS. For CUDA devices, it logs the compute capability (warning if insufficient for Triton compilation) and checks for active MIG partitions. It also provides actionable troubleshooting tips if a GPU is expected but cannot be found by PyTorch.

ptyrad.runtime.diagnostics.print_packages_info()[source]#

Logs installed versions of critical Python dependencies.

Reports the environment versions of Numpy, PyTorch, Optuna, and Accelerate. Crucially, it checks the runtime version of the ptyrad package against the installation metadata to detect “stale metadata” scenarios common in editable (pip install -e .) installs, warning the user if a mismatch is found.