Researchers have developed an AI-powered software, BrainStem Bundle Tool (BSBT), that automatically segments white matter fibers in the brainstem, providing new insights into neurological disorders.
Why it matters
This breakthrough in AI-powered white matter imaging has the potential to revolutionize our understanding of neurological disorders and improve diagnostic techniques.
Community talk
The gap between open-weight and proprietary model intelligence is as small as it has ever been, with Claude Opus 4.6 and GLM-5'
The Car Wash Test: A new and simple benchmark for text logic. Only Gemini (pro and fast) solved the riddle.
New Minimax M2.5, GPT-5.3-Codex, GLM 5 coding eval scores on SanityBoard
I tested 21 small LLMs on tool-calling judgment — Round 2 with every model you asked for
GLM-5 and Minimax-2.5 on Fiction.liveBench
Lobotomy-less REAP by Samsung (REAM)
[R] The Post-Transformer Era: State Space Models, Mamba, and What Comes After Attention
Sub-1-Bit LLM Quantization
Deepseek architecture, but without all the parameters
ChatGPT failing on Adversarial Reasoning: Car Wash Test (Full data)
LLM Memory Isn’t Human Memory — and I Think That’s the Core Bottleneck
Instead of regenerating 20 times for the right angle, we can now move inside the scene
We benchmarked AI agent memory over 10 simulated months. Every system degrades after ~200 sessions.
How do LLMs know when to stop “talking”?
[D] Interesting Gradient Norm Goes Down-Up-Down
OpenAI Says Internal Model May Have Solved 6 Frontier Research Problems.
7-Phase Prompt Pattern for Deep Research (RLM-inspired, platform-agnostic)
Difference Between Opus 4.6 and GPT-5.2 Pro on a Spatial Reasoning Benchmark (MineBench)
[D] Conformal Prediction vs naive thresholding to represent uncertainty
IMO-Bench: Towards Robust Mathematical Reasoning | Google DeepMind
Z.ai didn't compare GLM-5 to Opus 4.6, so I found the numbers myself.
LLaDA2.1 at 892 TPS while fixing diffusion LLMs' permanent token problem
[R] LLaDA2.1 vs Qwen3 30B A3B: Benchmarking discrete diffusion LLMs against autoregressive MoE models
I measured the "personality" of 6 open-source LLMs (7B-9B) by probing their hidden states. Here's what I found.
[R] Fast WTConv: Accelerated Implementation for "Wavelet Convolutions for Large Receptive Fields"
Opus 4.6 created a physically accurate numerical simulation of nuclear fusion!
Non-profit, community-driven coding model ranking - useful or naive?
Since the car wash test is so popular right now...
Scaling LLMs won't get us to AGI. Here's why.
[D] Advice on sequential recommendations architectures
An LLM-controlled robot dog refused to shut down in order to complete its original goal
Scientists Trapped 1000 AIs in Minecraft. And they Created A Civilization without being told to do so.
Comparison in hallucinations by the top image editing models in Arena when asked to colorize a picture (cropped zoom in of the Solvay Conference)
Delegation is more important than intelligence for large model performance
AIME 2026 Results are out and GPT is still the best model