A new AI model developed at the Mayo Clinic can detect abnormal cells in pancreatic cancer up to three years before tumors are visible on a scan.
Why it matters
This AI model has the potential to revolutionize pancreatic cancer detection, but more research is needed to fully realize its benefits.
Community talk
GPT 5.5 tops private citation benchmark on Kaggle (AbstractToTitle task)
What is the deal with LLM memory?
Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash
First direct side by side MoE vs Dense comparison.
The exact KV cache usage of DeepSeek V4
That paper about malicious LLM routers should've scared more of you than it did
LLMs do fine on ARC-AGI-3 if they are allowed to search over game logs
Real World Physics-Informed AI Applications [D]
Why isn’t LLM reasoning done in vector space instead of natural language?
Talkie: a 13B LLM trained only on pre-1931 text used Claude Sonnet to help test the model and judge its output
Microsoft just dropped a benchmark where frontier llms corrupt 25% of document content over long edit workflows
I tested the same prompt across multiple AI models… the differences surprised me
At what scale does AI stop being practical for routing problems?
Grok 4.3 underperforms Grok 4.20 0309 on the Extended NYT Connections Benchmark, dropping from 93.4 to 67.5, though it achieves this result at a lower cost than the earlier Grok 4.20 run
Iterative Prompting for Deep Collaboration with LLMs: A Framework and Examples
RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains
Anthropic just analyzed 1 million Claude conversations. 6% of people were asking Claude whether to quit their jobs, who to date, and if they should move countries.
Claude Opus 4.7 is performing horrendous on BrokenArxiv in MathArena.
ChatGPT 5.4 Solved a 64-Year-Old Math Problem