Notable AI releases and advancements in 2025 included OpenAI's continued growth, China's open-source wave, and growth of local models.
Why it matters
The 2025 AI landscape saw significant developments in open-source models, local models, and advancements in generative AI from OpenAI, but remains diverse and complex.
Community talk
LocalAI 3.8.0 released: Universal Model Loader (HF/Ollama/OCI), MCP Agent Streaming, Logprobs support, and strict SSE compliance.
Optimizing Token Generation in llama.cpp's CUDA Backend
Watch as my Llama.cpp and FastAPI servers process requests from my Unity game
Claude code can now connect directly to llama.cpp server
llamacpp-gfx906 new release
[Release] Hypnos i1-8B: I fine-tuned Hermes 3 on REAL IBM Quantum Computer data (133-qubit GHZ states). Beats Llama-70B in Logic.
It been 2 years but why llama 3.1 8B still a popular choice to fine tune?
I created a llama.cpp fork with the Rockchip NPU integration as an accelerator and the results are already looking great!
Strix Halo, Debian [email protected]&6.17.8, Qwen3Coder-Q8 CTX<=131k, llama.cpp@Vulkan&ROCm, Power & Efficiency
Is my CPU (i3-12100F) beating my gpu (rx 6600) at inference? (llama.cpp)
CPU-only LLM performance - t/s with llama.cpp
How I replaced Gemini CLI & Copilot with a local stack using Ollama, Continue.dev and MCP servers