Which NVIDIA tool aids data center monitoring and management?
Correct Answer: D
Explanation:
DCGM is the correct answer because NVIDIA DCGM stands for Data Center GPU Manager and is built for monitoring and managing NVIDIA GPUs in data center and cluster environments. NVIDIA’s DCGM documentation states that DCGM provides “continuous GPU telemetry at very low performance overheads” and provides mechanisms to gather, group, and analyze data at the job level.
NVIDIA’s DCGM documentation also states that DCGM-Exporter “allows users to gather GPU metrics and understand workload behavior or monitor GPUs in clusters,” exposing GPU metrics for monitoring tools such as Prometheus. Therefore, DCGM is the NVIDIA tool used for data center GPU monitoring and management. Why the other options are incorrect: TensorRT is for optimizing and running inference. Clara is NVIDIA’s healthcare and medical imaging platform. Mellanox Insight is not the primary NVIDIA data center GPU monitoring and management tool referenced for GPU operations; DCGM is. [Reference: NVIDIA DCGM Documentation; NVIDIA DCGM-Exporter Documentation.]
Question 2
How many Mellanox ConnectX-6 Single Port VPI cards are in a DGX A100 system?
Correct Answer: A
Explanation:
The DGX A100 system includes eight Mellanox ConnectX-6 Single Port VPI cards, providing high-speed connectivity (up to 200 Gb/s) for clustering and data transfer. These cards support versatile protocols (InfiniBand or Ethernet), enabling robust multi-node AI workloads, with eight being the standard configuration for this system.
(Reference: NVIDIA DGX A100 System Documentation, Networking Section)
Question 3
Which two components are included in GPU Operator? (Choose two.)
Correct Answer: A, C
Explanation:
The NVIDIA GPU Operator is a tool for automating GPU resource management in Kubernetes environments. It includes two key components: GPU drivers, which provide the necessary software to interface with NVIDIA GPUs, and the NVIDIA Data Center GPU Manager (DCGM), which offers health monitoring, telemetry, and diagnostics for GPU clusters. Frameworks like PyTorch and TensorFlow are separate AI development tools, not part of the GPU Operator, which focuses on infrastructure rather than application layers.