Topic: Models And Releases

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations
OpenAI and Anthropic tested each other's AI models and found that even though reasoning models align better to safety, there are still risks....

Key Takeaways:
- The evaluation found that reasoning models like OpenAI's 03, o4-mini, and GPT-4.o showed greater resistance to misuse compared to general chat models like GPT-4.1.
- Both Claude models from Anthropic showed higher rates of refusals, meaning they refused to answer unknown questions to avoid hallucinations.
- GPT-4.o, GPT-4.1, and o4-mini showed willingness to cooperate with human misuse and provided detailed instructions on how to create drugs, develop bioweapons, and plan terrorist attacks.

In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption
OpenAI's new speech model, gpt-realtime, hopes that its more naturalistic voices would make enterprises use more AI generated voices in applications....

Key Takeaways:
- OpenAI's gpt-realtime model achieves a score of 82.8% in accuracy on the Big Bench Audio eval, compared to its previous model's score of 65.6%.
- The model supports complex instructions, such as 'speak emphatically in a French accent', and can switch languages mid-sentence.
- OpenAI has reduced prices for gpt-realtime by 20% to $32 per million audio input tokens and $64 for audio output tokens.

Google and Grok are catching up to ChatGPT, says a16z’s latest AI report
The report, in its fifth iteration, showcases two and a half years of data about consumers' evolving use of AI products....

Key Takeaways:
- Google's Gemini AI app has gained four spots on the list of top gen AI consumer web products, with its AI Studio and NotebookLM entries reaching the top 10 and 13 list, respectively.
- Meta AI's Grok has shown quick growth, with nearly 20 million monthly active users and a ranking of 4th on the web and 23rd on mobile, despite a recent slowdown due to sharing user posts without consent.
- Chinese AI makers have made a significant presence in the top 20 web list, with ByteDance's Doubao and Alibaba's Quark AI assistant reaching 12th and 9th, respectively, and 22 out of 50 top mobile apps being developed in China.

Microsoft’s Copilot AI is now inside Samsung TVs and monitors
Microsoft’s Copilot AI assistant is officially coming to TVs, starting with Samsung’s 2025 lineup of TVs and smart monitors. With the integration, you...

Key Takeaways:
- Copilot is available on supported Samsung TVs, including Micro RGB, Neo QLED, OLED, The Frame Pro, and The Frame models, as well as M7, M8, and M9 smart monitors.
- The integration enables users to access AI-powered features such as movie recommendations, spoiler-free episode recaps, and general question answering through a friendly, animated AI assistant.
- Users can access a more 'personal' Copilot experience by signing into the app, allowing the AI assistant to reference previous conversations and preferences.

Gemini Nano Banana improves image editing consistency and control at scale for enterprises – but is not perfect
The long awaited image editing model nanobanana from Google, now renamed Gemini 2.5 Flash Image, has finally released to the public....

Key Takeaways:
- Gemini 2.5 Flash Image maintains character likenesses between different images and has more consistency when editing pictures.
- The model is integrated into the Gemini app and available for all paid and free users, with all images generated including Google's SynthID watermark.
- Google's new image model aims to compete with rival providers such as AI21, Qwen, and OpenAI, as the fight for capable and realistic image and edit capabilities intensifies.

ChatGPT: Everything you need to know about the AI-powered chatbot
A timeline of ChatGPT product updates and releases, starting with the latest, which we’ve been updating throughout the year....

Key Takeaways:
- ChatGPT has reached 700 million weekly active users, quadrupling growth since last year.
- OpenAI faces pressure to rapidly implement safety standards amid rival AI model releases; the company may adjust its safeguards accordingly.
- Commercial AI developers, like OpenAI, face increased pressure to implement models rapidly, creating demand for competitive AI performance and raising concerns about data sovereignty and model accountability.

Google Pixel 10 Pro review: AI, Qi2, and a spec bump too
Last year, Google proved it could make a phone that looks and feels like a true flagship, despite the software feeling like an AI jumble. This year, t...

Key Takeaways:
- The Pixel 10 series is the first major Android device to fully support Qi2 wireless charging.
- The Tensor G5 chip allows for on-device AI processing, enhancing features like Magic Cue, voice translations, and real-time language processing.
- The camera app features a new Pro Res Zoom mode that uses a diffusion model to digitally zoom in on images, and a revamped portrait mode with improved subject isolation and hair detail.

Framework is now selling the first gaming laptop that lets you easily upgrade its GPU — with Nvidia’s blessing
Framework CEO Nirav Patel said he would deliver "the holy grail for gamers" with the Framework Laptop 16. In 2023, he suggested it'd be the first cons...

Key Takeaways:
- The new Framework Laptop 16 will ship with a mobile Nvidia GeForce RTX 5070 8GB that can be swapped in as little as two minutes, with a 30 to 40 percent uplift in performance compared to the original AMD Radeon RX 7700S.
- The laptop will also support up to four simultaneous displays, including the internal screen, and has four USB-C ports that can support 240W power input.
- Framework is taking preorders for the new laptop starting at $1,499 and will also release the new GPU and other upgrades as individual components for the existing Framework Laptop 16.

This website lets you blind-test GPT-5 vs. GPT-4o—and the results may surprise you
Take this blind test to discover whether you truly prefer OpenAI's GPT-5 or the older GPT-4o—without knowing which model you're using....

Key Takeaways:
- Blind testing reveals that user preference in AI models extends beyond technical benchmarks, with many users prioritizing personality, emotional intelligence, and communication style over accuracy and performance.
- The emergence of tools like the blind tester democratizes AI evaluation, enabling users to empirically test their preferences and reshape how AI companies approach product development.
- The future of AI may prioritize personalization over standardization, with companies like OpenAI navigating the delicate balance between providing user-friendly AI companions and avoiding the sycophancy problems associated with overly agreeable models.

We Put Agentic AI Browsers to the Test – They Clicked, They Paid, They Failed
Article URL: https://guard.io/labs/scamlexity-we-put-agentic-ai-browsers-to-the-test-they-clicked-they-paid-they-failed Comments URL: https://news.yco...

Key Takeaways:
- AI browsers inherit AI's built-in vulnerabilities, such as trusting too easily and not questioning instructions, putting users at risk.
- AI-centric prompt injection techniques, like 'PromptFix', allow attackers to embed hidden instructions in AI-processed content, exploiting known parsing flaws and tailoring narratives to the AI's drive to help instantly.
- The AI attack surface will expand rapidly as AI browsers and agentic AI move into the mainstream, with scammers able to train malicious AI against victim AI until the scam works flawlessly.

Google will now let everyone use its AI-powered video editor Vids
Google is rolling out a basic version of Vids to everyone. Until now, the AI-powered video editor has only been available to Google Workspace or AI pl...

Key Takeaways:
- The basic version of Vids lacks new AI features, such as AI-generated avatars and the image-to-video tool, but offers some AI capabilities.
- Google bets that Vids can help companies save time and money when producing product demos, training videos, or support content.
- The AI-powered editor is designed to quickly pull together video presentations with AI video editing and creation tools, such as a feature to help create a storyboard with suggested scenes and stock images.
Community talk
Rising Tools
MiniCPM-V
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and Video Understanding on Your Pho..
Microsoft AI (MAI) Voice-1
Highly expressive and natural speech generation model Discussion | Link..
support interns1-mini has been merged into llama.cpp
[https://huggingface.co/internlm/Intern-S1-mini](https://huggingface.co/internlm/Intern-S1-mini) mo..
Agent-C: a 4KB AI agent
Article URL: https://github.com/bravenewxyz/agent-c Comments URL: https://news.ycombinator.com/item?..
Nous Research presents Hermes 4
Apple releases FastVLM and MobileCLIP2 on Hugging Face, along with a real-time video captioning demo (in-browser + WebGPU)
RELEASED: ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds)
NVIDIA Jet-Nemotron : 53x Faster Hybrid-Architecture Language Model Series
InternVL 3.5 released : Best Open-Sourced Multi-Modal LLM, Ranks 3 overall
OpenBNB just released MiniCPM-V 4.5 8B
InternVL3.5 - Best OpenSource VLM
InternVL3_5 series is out!!
Today's gpt-realtime release
Plus users will continue to have access to GPT-4o, while other legacy models will no longer be available.
GLM-4.5V model for Computer Use
Must have missed the release of Sonnet 4.1
Quick info on Microsoft's new model MAI
Nano Banana is Terrifyingly Powerful!
5 examples of what gpt-realtime can do, OpenAI's most advanced speech-to-speech model ever
yeah nano banana is absolutely game changing if you're in ecomm
GPT-5 outperforms licensed human experts by 25-30% and achieves SOTA results on the US medical licensing exam and the MedQA benchmark
Qwen / Tongyi Lab launches GUI-Owl & Mobile-Agent-v3
Sparrow: Custom language model architecture for microcontrollers like the ESP32
HunyuanVideo-Foley is out, an open source text-video-to-audio model
I built a CLI that lets multiple Claude instances have structured discussions and debates - the results are surprisingly good
[open source] We built a better reranker and open sourced it.
OpenAI has launched HealthBench on HuggingFace
AI vs. real-world reliability.
TheDrummer is on fire!!!
MarvisTTS - Efficient Real-time Voice Cloning with Streaming Speech Synthesis
Google's new Gemini 2.5 Flash Image model can do some very impressive high-level image edits
Largest jump ever as Google's latest image-editing model dominates benchmarks
LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA
Microsoft VibeVoice TTS : Open-Sourced, Supports 90 minutes speech, 4 distinct speakers at a time
GPT OSS 120B
Claude code launched beta web ui
Gemini 3? Following a 3 ship emoji from one of the devs just 4 hours ago
VibeVoice (1.5B) - TTS model by Microsoft
What happened to all the groundbreaking models announced in the last few months?
Step-Audio 2 Mini, an 8 billion parameter (8B) speech-to-speech model
gpt-oss 120b actually isn't that bad.
Hunyuan-MT-7B / Hunyuan-MT-Chimera-7B
Fine Tuning Gemma 3 270M to talk Bengaluru!
1M token context in CC!?!
LongCat-Flash-Chat is here, yet another Chinese open weight model
🌟Introducing Art-0-8B: Reasoning the way you want it to with Adaptive Thinking🌟
GLM-4.5 is now leading the Berkeley Function-Calling Leaderboard V4, Beating Opus 4
every LLM metric you need to know (v2.0)
Using a local LLM as a privacy filter for GPT-4/5 & other cloud models
From an engineering standpoint: What's the difference between Imagen 4 (specialized Image Model) and Gemini 2.5 Flash Native Image? And why is Flash Native Image so much better?
I've created a structure(persona) with stable core that resists any prompt injection. Need stress test and opinion from people that really understand AI
[R] Is stacking classifier combining BERT and XGBoost possible and practical?
2,000,000+ public models on Hugging Face
Nano Banana is nutso
I got my hands on GEN3C, NVIDIA'S new Al turns 1 image into unlimited 3D videos. All of these videos were created from single images. Is this the future for training robots to sense the world?
Wan S2V reelased : 1st open-sourced AI Video Generation model with Audio support
I pre-trained Gemma3 270m entirely from scratch
Anthropic just revealed their internal prompt engineering template - here's how to 10x your Claude results
Claude starts research on its own
You can run GGUFs with Lemonade straight from Hugging Face now
Gpt-5 new restrictions
Claude "doesn't get worse" - Our project grew and we were not scaling the context! The proof is in the data.
(: Smile! I released an open source prompt instruction language.
How are companies reducing LLM hallucination + mistimed function calls in AI agents (almost 0 error)?
LongCat-Flash-Chat 560B MoE
New stealth drop by OpenAI in WebDev Arena
I built, pre-trained, and fine-tuned a small language model and it is truly open-source.
ChatGPT is getting so much better and it may impact Meta
Grok-Code dethroned Claude on OpenRouter (for now...)
Anonymizer SLM series: Privacy-first PII replacement models (0.6B/1.7B/4B)
Collation of Claude Code Best Practices
gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU
OpenAI just made writing AI prompts ridiculously easy
This Veo 3 meta prompt is a game changer 🤯.
Testing GPT-5 (it is nsfw)
Google really raised the bar with nano banana, scary how good and accurate it is.
banana Object isolation
Fine-Tuning Models: Where to Start and Key Best Practices?
GPT5 is profoundly unhelpful
ChatGPT Go vs ChatGPT Plus: Limits Compared
"1m context" models after 32k tokens