Nvidia's data center networking business has grown significantly, driven by its AI processing capabilities, with $31 billion in revenue last year. The company's strategic acquisition of Mellanox in 2020 has been a key factor in its success.
Why it matters
Nvidia's strategic acquisition of Mellanox has been a key factor in its success, highlighting the importance of data center networking in the AI era.
Community talk
Krasis LLM Runtime: 8.9x prefill / 4.7x decode vs llama.cpp — Qwen3.5-122B on a single 5090, minimal RAM
Nebius signs a new AI infrastructure agreement with Meta (up to ~$27B)
Gwen3.5-27b 8 bit vs 16 bit, 10 runs
Cold starting a 32B model in under 1 second (no warm instance)
running Qwen3.5-27B Q5 splitt across a 4070ti and an amd rx6800 over LAN @ 13t/s with a 32k prompt
2000 TPS with QWEN 3.5 27b on RTX-5090
Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron)
I spent 8+ hours benchmarking every MoE backend for Qwen3.5-397B NVFP4 on 4x RTX PRO 6000 (SM120). Here's what I found.
Anyone else feel like OTel becomes way less useful the moment an LLM enters the request path?
Main observability and evals issues when shipping AI agents.
[D] What's the modern workflow for managing CUDA versions and packages across multiple ML projects?