medical / technology / education / art / flub
Learning to Summarize with Human Feedback: We've applied reinforcement learning from human feedback to train language models that are better at summarization. Our models generate summaries that are better than summaries from 10x larger models trained only with supervised learning. Even though we train our models on the Reddit TL;DR dataset, the same models transfer
Source: openai.com
models summaries train feedback learning human better 10x