Mistral has launched Devstral 2, a 123-billion parameter dense transformer optimized for agentic software development, along with a smaller variant, Devstral Small 2, and a command-line interface, Vibe CLI. Devstral 2 achieves 72.2% on SWE-bench Verified and is available for free for a limited time via Mistral's API and Hugging Face.
Why it matters
The release of Devstral 2 and Devstral Small 2 offers developers a significant boost in coding capabilities, but requires careful consideration of licensing terms.
Community talk
OpenAI declares Code Red and rushing "GPT-5.2" for Dec 9th release to counter Gemini 3
BREAKING: OpenAI declares Code Red & rushing "GPT-5.2" for Dec 9th release to counter Google
Gemini 3 "Deep Think" benchmarks released: Hits 45.1% on ARC-AGI-2 more than doubling GPT-5.1
GLM-4.6V (108B) has been released
Google’s new Gemini 3 Pro Vision benchmarks officially recognize "Claude Opus 4.5" as the main competitor
Report: OpenAI planning new model release for Dec 9th to counter Gemini 3 (Source: The Verge)
DeepSeek-V3.2-REAP: 508B and 345B checkpoints
Claude Opus 4.5 really sets a new bar for LLMs that will make the others sweat
Open AI rushes to launch 5.2
Z.ai releases GLM-4.6V: A 9B "Flash" model that beats Qwen2-VL-8B,128k context and completely FREE via API.
Gemini 3 Flash on LMarena
[D] How did Gemini 3 Pro manage to get 38.3% on Humanity's Last Exam?
Gemini 3 Deep Think now available
GPT-5.1-Codex-Max is coming to the API imminently
Upcoming models from llama.cpp support queue (This month or Jan possibly)
Heretic GPT-OSS-120B outperforms vanilla GPT-OSS-120B in coding benchmark
UPDATE: Claude now supports asynchronous agents!!!!
Minimax M2
We broke the 50% barrier! Poetiq is now #1 on ARC-AGI-2 with 54% Accuracy and half the cost of previous SOTA.
DeepSeek V3.2 (14.9%) scores above GPT-5.1 (9.5%) on Cortex-AGI despite being 124.5x cheaper.
Gemini 3 Pro Vision benchmarks: Finally compares against Claude Opus 4.5 and GPT-5.1
Mistral 3 14b against the competition ?
OpenAI’s GPT 5.1 Codex Max is now available in public preview inside GitHub Copilot
Tell us a task and we'll help you solve it with Granite
Deepseek's progress
Four major AI Video models shipped synchronized audio in 72 hours
GPT-5.1 vs Gemini 3 vs Claude Opus 4.5 on real data analysis tasks.
Kling AI 2.6 Just Dropped: First Text to Video Model With Built-in Audio & 1080p Output
DeepSeek V3.2 Technical Report
I'm surprised how simple Qwen3 VL's architecture is.
GPT-5.2 next week! It will arrive on December 9th.
Codex Max overtakes Anthropic models on LB coding.
A tiny 4B model you can run on your laptop now hits ~80–85% of full GPT‑4.1 ability
Claude opus 4.5 is now on antigravity
[R] [N] TabPFN now scales to millions of rows (tabular foundation model)
Claude Performance and Workarounds Report - December 1 to December 8
54% on ARC-AGI 2 is now Officially Verified
5.1 has destroyed any aspect of creativity
Claude Opus 4.5 in Google Antigravity