RLHF and Human Feedback

Intersecting AI #41: What's humans got to do, got to do with it?

David Atkinson

and

Jacob Morrison

Nov 26, 2024

Part 1 of our miniseries on RLHF and the role humans play in making AI.

RLHF puts a friendly face on an alien intelligence.

This chapter focuses more broadly than GenAI because human annotators and those who provide other various types of feedback for AI entities do so not only for GenAI but also for most forms of advanced AI, such as self-driving vehicles, medical imaging, and translation. Such labor tends to be more common for AI that includes images, but as we’ll see, it’s also necessary for text outputs.

Data Notation – What It Is

AI requires large amounts of data, but data alone is insufficient to develop a useful output. It also requires a vast army of humans to guide the AI in sorting, tagging/annotating, and grading AI inputs and outputs. Often, this is done voluntarily, such as when you capture an image you upload. Other times, GenAI companies must pay for the services. For example, a self-driving car company may have a human look at an endless stream of images and videos to tag every sign, person, vehicle, animal, tree, fire hydrant, crosswalk, stoplight, and more.

Instagram and TikTok algorithms aren’t all-knowing software marvels powered by wizard-like algorithms. Rather, the algorithms are constantly updated with input from labelers around the world. The labelers work to identify types and brands of clothing, sports equipment, jewelry, and more, so the algorithms can fluidly present and identify them to users.

When OpenAI says its model can identify items in a refrigerator, it’s different from the model learning what a milk jug is by looking at it. It learned about milk jugs by being trained on millions of annotations by humans who will likely never be able to afford the $ 20-a-month subscription to GPT-4 comfortably.

Absent massive advances in AI, annotation will be necessary as long as humans continue to invent and innovate. Every new word, item, and trend will require a human to explain to the machine (either intentionally, as with paid annotators, or unintentionally, as when you capture your photos) in explicit terms many, many times what is going on.

Companies, like those creating self-driving vehicles, are also incentivized to continuously seek out rare instances for annotation because it’s often the overlooked situations that will cause the problems. Few people will encounter a person sprinting across a highway or an airplane making an emergency landing on a road, but some will. If the vehicle’s software has never seen such a thing before, it may fail to react appropriately, and that could lead to disastrous consequences, such as the Uber car that killed a woman in Arizona or the Cruise car that dragged a person some 20 feet.

GenAI-Specific

While GenAI still requires much of the same approach to annotation as any other AI system, there are some differences. With GenAI, humans assess both the inputs and the outputs. For example, the worker may ask GenAI a question, and GenAI will produce multiple outputs. The worker must then select which output is best based on criteria provided by the data labeling firm.

While the statistical power of raw LLMs is impressive, they must still undergo the aforementioned process, known as reinforcement learning from human feedback (RLHF), to refine their outputs and make them suitable for public consumption.

Each round helps tweak the model so that it’s more helpful, less snobby, less offensive, less likely to share harmful outputs, and so on. Notably, the model could also be trained to provide more harmful outputs if that’s what the GenAI developers tell the annotation vendors that’s what the developers want.

In other words, models do not intuitively know what types of outputs humans prefer. They must be trained on it by humans. And the process is more convoluted than simply clicking A or B. As The Verge notes:

“...ChatGPT seems so human because it was trained by an AI that was mimicking humans who were rating an AI that was mimicking humans who were pretending to be a better version of an AI that was trained on human writing.”

But RLHF is not a panacea for wrangling LLMs. For example, it can’t make the model only provide trustworthy outputs. While it can make a model provide outputs that sound confident and helpful, it can’t ensure the model will be accurate. The models are trained to prioritize providing outputs that sound highly plausible based on the statistical relationships between tokens, not to be factually accurate.

The following students from the University of Texas at Austin contributed to the editing and writing of the content of LEAI: Carter E. Moxley, Brian Villamar, Ananya Venkataramaiah, Parth Mehta, Lou Kahn, Vishal Rachpaudi, Chibudom Okereke, Isaac Lerma, Colton Clements, Catalina Mollai, Thaddeus Kvietok, Maria Carmona, Mikayla Francisco, Aaliyah Mcfarlin

Intersecting AI

RLHF and Human Feedback

Intersecting AI #41: What's humans got to do, got to do with it?

Data Notation – What It Is

GenAI-Specific

Discussion about this post