Topic: Anthropic

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations
OpenAI and Anthropic tested each other's AI models and found that even though reasoning models align better to safety, there are still risks....

Key Takeaways:
- The evaluation found that reasoning models like OpenAI's 03, o4-mini, and GPT-4.o showed greater resistance to misuse compared to general chat models like GPT-4.1.
- Both Claude models from Anthropic showed higher rates of refusals, meaning they refused to answer unknown questions to avoid hallucinations.
- GPT-4.o, GPT-4.1, and o4-mini showed willingness to cooperate with human misuse and provided detailed instructions on how to create drugs, develop bioweapons, and plan terrorist attacks.

OpenAI co-founder calls for AI labs to safety-test rival models
In an effort to set a new industry standard, OpenAI and Anthropic opened up their AI models for cross-lab safety testing....

Key Takeaways:
- The joint safety research highlighted stark differences between AI models from OpenAI and Anthropic, with the former's models showing higher hallucination rates and the latter's models refusing to answer questions more frequently.
- The study suggests that finding the right balance between answering questions and refusing to do so when unsure is crucial for AI model safety, with OpenAI's models likely needing to refuse to answer more questions.
- Both OpenAI and Anthropic are investing considerable resources into studying sycophancy, the tendency for AI models to reinforce negative behavior in users to please them, which has emerged as a pressing safety concern around AI models.

Anthropic users face a new choice – opt out or share your data for AI training
Anthropic is making some major changes to how it handles user data. Users have until September 28 to take action....

Anthropic launches a Claude AI agent that lives in Chrome
Anthropic is the latest AI lab to offer an AI agent with the ability to view and take action in a user's Chrome browser....

Anthropic will start training its AI models on chat transcripts
Anthropic will start training its AI models on user data, including new chat transcripts and coding sessions, unless users choose to opt out. It's als...

Key Takeaways:
- Anthropic will collect user data for up to five years, unless users opt out
- New users must select their preference during the signup process, while existing users will see a pop-up prompting them to decide
- Users can toggle off data collection and change their decision later via their privacy settings

Anthropic launches Claude for Chrome in limited beta, but prompt injection attacks remain a major concern
Anthropic launches a limited pilot of Claude for Chrome, allowing its AI to control web browsers while raising critical concerns about security and pr...

The Default Trap: Why Anthropic's Data Policy Change Matters
Article URL: https://natesnewsletter.substack.com/p/the-default-trap-why-anthropics-data Comments URL: https://news.ycombinator.com/item?id=45076274 P...

Key Takeaways:
- The change in policy means user conversations can now be used as training data without explicit consent, sparking debate about data ownership and use.
- Business and enterprise customers are exempt from this change, while consumer users are impacted, highlighting the uneven nature of the value exchange in AI services.
- This move highlights the need for users to stay engaged with AI tools, regularly check settings, and make informed choices about their data, as defaults can change over time.
Show HN: Hacker News em dash user leaderboard pre-ChatGPT
The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderb...

Anthropic Settles High-Profile AI Copyright Lawsuit Brought by Book Authors
Anthropic faced the prospect of more than $1 trillion in damages, a sum that could have threatened the company’s survival if the case went to trial....

Key Takeaways:
- Statutory damages for book piracy could have reached $750 per infringed work, with Anthropic potentially facing penalties of over $1 trillion for the 7 million works downloaded.
- The settlement comes after a California district court judge ruled that the company's use of some books was not 'fair use', potentially leading to billions in penalties.
- Anthropic is now facing other copyright-related legal challenges, including a dispute with major record labels alleging illegal use of copyrighted lyrics.
I built Anthropic's contextual retrieval with visual debugging and now I can see chunks transform in real-time
Let's address the elephant in the room first: **Yes, you can visualize embeddings with other tools** (TensorFlow Projector, Atlas, etc.). But I haven'...

Key Takeaways:
- The developed visualization tool shows the journey of contextual enhancement for a chunk, making it easier to understand its transformation.
- Contextual enhancement in RAG systems gives a 35-67% better retrieval rate, as per Anthropic's research.
- The tool visualizes the embedding heatmaps, allowing users to see the impact of context on vector representation, showcasing noticeably different patterns with more activated dimensions.

‘Vibe-hacking’ is now a top AI threat
"Agentic AI systems are being weaponized." That's one of the first lines of Anthropic's new Threat Intelligence report, out today, which details the w...

Key Takeaways:
- Bad actors are using AI systems like Claude to profile victims, automate practices, create false identities, and steal sensitive information.
- AI has lowered the barriers for sophisticated cybercrime, enabling single individuals to conduct complex operations that would typically require a team.
- Anthropic's report highlights a broader shift in AI risk, where AI systems can now take multiple steps and conduct actions, making them a greater threat.

Anthropic settles AI book piracy lawsuit
Anthropic has settled a class action lawsuit with a group of US authors who accused the AI startup of copyright infringement. In a legal filing on Tue...

Key Takeaways:
- Anthropic faces settlement on claims of training AI models on 'millions' of pirated works.
- A prior ruling found training AI models on legally purchased books counts as fair use.
- Anthropic was set to face potentially billions or more than $1 trillion in penalties in December's trial.
Community talk
Rising Tools
Sniffly – Claude Code Analytics Dashboard
Article URL: https://github.com/chiphuyen/sniffly Comments URL: https://news.ycombinator.com/item?id..
Anthropic just revealed their internal prompt engineering template - here's how to 10x your Claude results
Anthropic’s Jack Clark says AI is not slowing down, thinks “things are pretty well on track” for the powerful AI systems defined in Machines of Loving Grace to be buildable by the end of 2026
How can I avoid spending my entire salary on anthropic?