Google DeepMind announced the Gemma 4 family of open models, featuring multimodal capabilities and advancements in reasoning, multimodalities, and architectures.
Why it matters
The Gemma 4 release presents a significant advancement in multimodal capabilities, enhancing AI development and applications.
Community talk
[NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning
KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)
Google’s Gemma 4 12B just dropped - here’s how to run it locally on your Mac
How does the new abliteration tool Apostate compare with others? - Abliterlitics
NVIDIA releases Cosmos 3 Omnimodal world modelson HF
mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released !
dvlt.cu: inference engine written from scratch in CUDA/C++ for NVIDIA's DVLT 3D transformer model
Qwen3.6-35B-A3B-Uncensored-Claude-4.6-Genesis-APEX-GGUF
DeepSeek V4 Flash is amazing! (WIP llama.cpp PR #24162)
AA comparison of the latest local models
A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic)
Google's quantization aware trained Gemma checkpoints enabling mobile device inference just dropped on HF
Gemma 4 QAT GGUFs from Unsloth
Unsloth just dropped MTP GGUF weights for Gemma 4!
Gemma 4 12B is my new main squeeze
BeeLlama v0.3.1 – latest llama.cpp with extras! DFlash, MTP, q6_0 cache, TurboQuant. Single RTX 3090: Qwen 3.6 27B & Gemma 4 31B up to 177.8 tps (4.93x over baseline)
You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter.
Ran gemma 4 12b on my 3090 yesterday and I think the local model game just changed
The first Gemma 4 12B finetunes are ready
Another shout out to llama.cpp build b9455 2x3090
I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size
Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026
We built a desktop study app around Codex CLI as the local AI runtime
I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python
All Italian legislation, free, on GitHub in Markdown.
My open source local multi agent harness went from 0 to 350 stars in one day here to tell that it’ll keep working after 15 June
Claude doesn't have to be a money machine. I used it to build an open-source tool that tracks how politicians in my Brazilian state spend public money.
Coding agent built as developer-driven workflows — human-in-the-loop, hybrid search, editable context
Mellum 2 12B A2.5B
Why is nobody talking about Tencent’s Hy3 Preview?
I built an open-source Desktop App that gives your AI persistent memory across all platforms (100% Local SQLite, Zero-Docker)