Demo NVIDIA NCP-AII Exam Questions

Demo practice questions for guest users.

Section: Practice Mode 7 Questions
Demo Practice
Question 1

For a 48-hour NCCL burn-in test, which parameters ensure sustained fabric stress while detecting silent data corruption?

Correct Answer: B
Explanation:
The NVIDIA Collective Communications Library (NCCL) tests are the gold standard for validating the interconnect performance of a GPU cluster. For a long-duration burn-in (48 hours), the goal is not just to measure peak bandwidth, but to stress the fabric under load to catch intermittent hardware failures or " Silent Data Corruption " (SDC). The all_reduce_perf test is the most intensive as it involves bidirectional data flow across all GPUs. The specific parameters in Option B are critical: -b 8G -e 32G sets the message size range to large buffers that saturate the 400G InfiniBand links; -c 1000 ensures a high number of iterations for statistical significance; -z 1 (check) is the most vital flag, as it enables verification of the mathematical result. If a bit flips during transmission due to a faulty transceiver, the -z 1 flag will catch the mismatch and report a failure. Finally, -G 1000 ensures the test runs long enough to reach thermal equilibrium across the switches and HCAs.

Demo Practice Mode

You are viewing only the questions marked as Demo.

BACK TO EXAM