NVIDIA stated it has achieved a report giant language mannequin (LLM) inference velocity, asserting that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved greater than 1,000 tokens per second (TPS) per person on the 400-billion-parameter Llama 4 Maverick mannequin.
Support authors and subscribe to content
This is premium stuff. Subscribe to read the entire article.