AI-ContentLab

What is Reinforcement learning with human feedback (RLHF)

Background Reinforcement learning from human feedback (RLHF) is a technique in machine learning that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning (RL) through an optimization algorithm like Proximal Policy Optimization. The reward model is trained in advance to the policy being optimized to predict if a given output is good (high reward) or bad (low reward). RLHF can improve the robustness and exploration of RL agents, especially when the reward function is sparse or noisy. Human feedback is most commonly collected by asking humans to rank instances of the agent's behavior. These rankings can then be used to score outputs, for example with the Elo rating system. While the preference judgment is widely adopted, there are other types of human feedback that provide richer information, such as numerical feedback, and natural language. RLHF has been applied to natur

Search This Blog

Posts

What is Reinforcement learning with human feedback (RLHF)

You may like