RL Hacking - Search News

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Code for reproducing the results in the VinePPO paper. This codebase also provides performant implementation (leveraging vLLM as inference engine*) of popular RL and RL-free baselines (such as PPO, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Trending now