Researchers from Salesforce unveiled MCPEval, a new method to evaluate AI agent performance and tool use within MCP servers....
Teams can scale Qwen3’s capabilities to single-node GPU instances or local development machines, avoiding the need for massive GPU clusters....
OpenAI and Penda Health debut an AI clinical copilot that cuts diagnostic errors by 16% in real-world use—offering a new path for safe, effective AI i...
OpenAI and Google's AI models achieved impressive results in a difficult math competition, but disputed how the other got their score....
Article URL: https://alignment.anthropic.com/2025/subliminal-learning/ Comments URL: https://news.ycombinator.com/item?id=44650840 Points: 24 # Commen...
Article URL: https://www.readtpa.com/p/stop-pretending-chatbots-have-feelings Comments URL: https://news.ycombinator.com/item?id=44650694 Points: 46 #...
Gemini 2.5 Flash-Lite, previously in preview, is now stable and generally available. This cost-efficient model provides high quality in a small size, ...
Qwen3 coder will be in multiple sizes
OpenAI's IMO model "knew" it didn't have a correct solution
Open source qwen model same benchmark as claude 4 sonnet in swe bench verified !!
[D] Is there anyone using GRPO in their company?
Updated Strix Halo (Ryzen AI Max+ 395) LLM Benchmark Results
Kimi K2: A 1 Trillion Parameter LLM That is Free, Fast, and Open-Source
Google DeepMind Just Solved a Major Problem with AI Doctors - They Created "Guardrailed AMIE" That Can't Give Medical Advice Without Human Oversight
o4-mini actually can solve 90% of 2025USAMO
Private Eval result of Qwen3-235B-A22B-Instruct-2507
MegaTTS 3 Voice Cloning is Here
I finally found a prompt that makes ChatGPT write naturally 🥳🥳
Qwen3-Coder Available on chat.qwen.ai
Anthropic Status Update: Tue, 22 Jul 2025 18:09:28 +0000
Sneak peak into colossus 2. it will host over 550k GB200s & GB300s in just a few weeks!
What is Anthropic going to do when Claude is just another model?
Wow even the standard Gemini 2.5 pro model can win a gold medal in IMO 2025 with some careful prompting. (Web search was off, paper and prompt in comments)