Comments on: What is reinforcement learning from human feedback (RLHF)? https://bdtechtalks.com/2023/01/16/what-is-rlhf/?utm_source=rss&utm_medium=rss&utm_campaign=what-is-rlhf Technology solving problems... and creating new ones Thu, 27 Jul 2023 21:48:58 +0000 hourly 1 By: Eric https://bdtechtalks.com/2023/01/16/what-is-rlhf/comment-page-1/#comment-36374 Thu, 27 Jul 2023 21:48:58 +0000 https://bdtechtalks.com/?p=15625#comment-36374 When training LLM using RLHF, if the model generates a sentence of 16 tokens (one token for each time step) and gets a reward, how does RLFH allocate the credits amongst the 16 tokens? If so, how does this compare to simply rank the sentence at the end, and allocate the loss in one shot?

]]>