Entropy-Modulated Policy Gradients (EMPG) addresses learning dynamics issues in LLMs by recalibrating policy gradients based on uncertainty and task o...
FLUX-Reason-6M and PRISM-Bench address the lack of reasoning-focused datasets and benchmarks for text-to-image models, providing a large-scale dataset...
A novel framework UAE uses reinforcement learning to unify image-to-text and text-to-image processes, enhancing mutual understanding and generation fi...
SimpleVLA-RL, an RL framework for VLA models, enhances long-horizon action planning, achieves state-of-the-art performance, and discovers novel patter...
EchoX, a speech-to-speech large language model, addresses the acoustic-semantic gap by integrating semantic representations, preserving reasoning abil...
HuMo is a unified framework for human-centric video generation that addresses challenges in multimodal control through a two-stage training paradigm a...
Kling-Avatar, a cascaded framework, enhances audio-driven avatar video generation by integrating multimodal instruction understanding with photorealis...
VLA-Adapter reduces reliance on large-scale VLMs and extensive pre-training by using a lightweight Policy module with Bridge Attention, achieving stat...
MachineLearningLM enhances a general-purpose LLM with robust in-context machine learning capabilities through continued pretraining with synthesized M...
AU-Harness is an efficient and comprehensive evaluation framework for Large Audio Language Models (LALMs) that addresses issues of speed, reproducibil...
SpatialVID, a large-scale dataset with diverse videos and dense 3D annotations, enhances model generalization and performance in video and 3D vision r...