Subscribe to the aifeed.fyi daily digest
Receive the most impactful AI developments of the day, 100% free.

AI news for: Rl

Explore AI news and udpates focusing on Rl for the last 7 days.

GPT-OSS Reinforcement Learning
GPT-OSS Reinforcement Learning
source docs.unsloth.ai Yesterday

Article URL: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning Comments URL: https://news.ycombinator.com/item?id=45392744 Points: 88 # Comme...

TL;DR
OpenAI gpt-oss can now be trained with Reinforcement Learning (RL) and Generalized Proximal Policy Optimization (GRPO) via Unsloth, without compromising performance or using excess VRAM.

Key Takeaways:
  • Unsloth achieves 3x faster inference for gpt-oss RL compared to native Transformers, with no accuracy loss.
  • Unsloth's 4-bit inference is ~4x faster than BF16, and BF16 is more efficient in VRAM use, especially on older GPUs.
  • Unsloth supports gpt-oss training on a wide range of GPUs, including older devices with only 15GB VRAM, and is compatible with OpenAI's gpt-oss-20b architecture.
28 Sep
27 Sep
26 Sep
25 Sep
24 Sep
23 Sep
22 Sep