1 comments

  • connorturland 2 hours ago
    I finally extracted some useful signals about what results you can get on the DGX Station machines. A bit of news broke via AI engineer conference today.

    Would have preferred Kimi 2.7 Code numbers, but 2.5 was what I could get.

    Kimi 2.5, 1.1T params 40-50 tok/s total output across all users NVIDIA rep number; about 595GB model weights; we still need benchmark conditions

    Nemotron Ultra, 550B ~35 tok/s at concurrency 1; scales to 4-5 concurrent users NVIDIA rep number; useful because it includes a concurrency claim

    GLM-5.2-REAP, 504B ~60 tok/s Public 0xSero number from AI Engineer; Alec Fong says an earlier GLM NVFP4 attempt was ~25 tok/s; still missing exact quant, prefill, context, and memory residency/concurrency details

    I also learned a lot about what it costs and when it's shipping.

    Full writeup at the link