DGX station and "frontier" models, my hunt for answers

(atcyrus.com)

3 points | by connorturland 2 hours ago

1 comments

connorturland 2 hours ago
I finally extracted some useful signals about what results you can get on the DGX Station machines. A bit of news broke via AI engineer conference today.
Would have preferred Kimi 2.7 Code numbers, but 2.5 was what I could get.
Kimi 2.5, 1.1T params 40-50 tok/s total output across all users NVIDIA rep number; about 595GB model weights; we still need benchmark conditions
Nemotron Ultra, 550B ~35 tok/s at concurrency 1; scales to 4-5 concurrent users NVIDIA rep number; useful because it includes a concurrency claim
GLM-5.2-REAP, 504B ~60 tok/s Public 0xSero number from AI Engineer; Alec Fong says an earlier GLM NVFP4 attempt was ~25 tok/s; still missing exact quant, prefill, context, and memory residency/concurrency details
I also learned a lot about what it costs and when it's shipping.
Full writeup at the link