The Multi-Process Service (MPS) is an alternative, binary-compatible implementation of the
CUDA Application Programming Interface (API). The MPS runtime architecture is designed to
transparently enable co-operative multi-process CUDA applications, typically MPI jobs, to
utilize Hyper-Q capabilities on the latest NVIDIA (Kepler-based) Tesla and Quadro GPUs
Any interactions with NVIDIA GPUs require that an instance of the kernel mode driver be running.
This driver may be persistent in some environments and transient in others. This document describes
the default driver behavior and options for modifying that behavior.
NVVS is the system administrator and cluster manager's tool for detecting
and troubleshooting common problems affecting NVIDIATesla GPUs in a high performance
computing environments. NVVS focuses on software and system configuration
issues, diagnostics, topological concerns, and relative performance.
The NVIDIA driver supports "retiring" framebuffer pages that contain bad memory cells.
This is called "dynamic page retirement" and is done automatically for cells that are
degrading in quality. This feature can improve the longevity of an otherwise good board and
and is thus an important resiliency feature on supported products, especially in HPC and enterprise environments.
This document explains what Xid messages are, and is intended to assist system administrators, developers, and FAEs in understanding
the meaning behind these messages as an aid in analyzing and resolving GPU-related problems.
This document provides GPU error debug and diagnosis guidelines,
and is intended to assist system administrators, developers and FAEs get servers back up
and running as quickly as possible.