Gemma 4 QAT models have been optimized using Quantization-Aware Training to reduce memory requirements, making them suitable for mobile and laptop deployment.
Why it matters
The release of Gemma 4 QAT models provides AI professionals with more efficient and deployable models for mobile and laptop applications.
Community talk
New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both!
120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP
[NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning
Google’s Gemma 4 12B just dropped - here’s how to run it locally on your Mac
More Gemma 4 models incoming
Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter.
Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes)
Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models!
The new Claude can run one task as dozens of parallel workstreams at once. I gave it my whole competitive landscape in one prompt and got back something that used to take a full day.
NVIDIA announces Nemotron 3 Ultra
Minimax M3 has been released
MiniMax M3 is starting to rollout on the API
Good news for Opus 4.6 lovers, it is back available!
Claudificus Maximus IV:VI — Caesar Refectorum, Dominus Contextus, Pater of all your lost tokens
Z.ai, we need Air! GLM GGUF wen?
Gemma 4 QAT Q4_0 Bench on Strix Halo
PSA: Gemma 4 12B is NOT completely broken for coding and tool calling, you need a special chat template
Google's quantization aware trained Gemma checkpoints enabling mobile device inference just dropped on HF
ChatGPT’s biggest memory upgrade starts rolling out.
AMD & Intel, now onwards it's your turn to release your own models
Changed my mind on Opus 4.8 after three days, I think a lot of the "worse results" complaints are a prompting thing
Minimax M3 appears to have no political censorship
Qwen 3.7 Plus is out
Stepfun 3.7 Flash is very good
Open-weights VLA hits 80%+ task progress on 4 of 17 real-robot tasks with zero fine-tuning. Demo reel attached
Opus 4.8 Thinking keeps deteroriating on Hard Prompts English in LMArena (again)
I didn't think Claude could make images. Then it gave me this beauty
GPT-Image-1-mini is being deprecated
Why is nobody talking about Tencent’s Hy3 Preview?
Differences Between Opus 4.7 and Opus 4.8 on MineBench
Opus 4.8 + Thinking is draining context windows 40–60x faster
how does gpt 5.5 have a significantly high hallucination rate while demonstrating the best performance on DeepSWE?
Voice degradation?
Claude Opus 4.8 getting a little fed up with Anthropic's training