DeepSeek V3.2 achieves 7360 TGS throughput on a single GB300 GPU, but still has significant room for improvement.
Why it matters
DeepSeek V3.2 still has significant room for improvement, and its performance relies heavily on the choice of parallelization strategy and model architecture.
Community talk
New DeepSeek update: "DeepSeek Web / APP is currently testing a new long-context model architecture, supporting a 1M context window."
DeepSeek has launched grayscale testing for its new model on both its official website and app. 1M content length!
DeepSeek-v4 Benchmarks Leaked
DeepSeek just updated to a 1M context window!
Deepseek architecture, but without all the parameters