Topic: Deepseek

Deploying DeepSeek on 96 H100 GPUs
Article URL: https://lmsys.org/blog/2025-05-05-large-scale-ep/ Comments URL: https://news.ycombinator.com/item?id=45064329 Points: 90 # Comments: 28...

SGLang team successfully replicates DeepSeek's inference system using prefill-decode disaggregation, expert parallelism, and large-scale load balancing, achieving a throughput of 52.3k input tokens per second and 22.3k output tokens per second.
Key Takeaways:
Key Takeaways:
- PF disaggregation optimizes prefill and decode phases separately, reducing latency and improving efficiency.
- EP and EPLB achieve a significant speedup of 1.49x (prefill) and 2.54x (decode) by addressing workload imbalances across GPUs.
- DisposableTensor and expert workload extraction tools enhance memory management and analysis, providing insights for optimization and simulation.
Community talk
Video Updates
Microsoft New AI is 100X More Intelligent Than DeepSeek R1
AI revolutionX
•
3h ago
SHOCKING AI That Broke the Internet This Month: DeepSeek New AI, GPT 5, Google’s MAD & Mangle...
AI revolutionX
•
Yesterday
02
Sep
01
Sep
31
Aug
30
Aug
29
Aug
28
Aug
27
Aug
Deepseek r1 671b on a $500 server. Interesting lol but you guessed it. 1 tps. If only we can get hardware that cheap to produce 60 tps at a minimum.
Reverse engineered 4o's system prompt for Deepseek