The Consensus Machine

By SaintPixel 02 June 2026

When you ask a generator for an image, or an assistant like ChatGPT for an answer, you are querying a system trained on vast amounts of data, then tuned, in a second step, to our preferences. This second education has a name, reinforcement learning from human feedback (RLHF), and a cost we rarely examine: it takes a situated human judgment, averages it, and freezes it into a measure. This text describes what that freezing does to representations, what sets it apart from the older factories of taste, and why the problem is not the tool but what we point it at.

A generative model first learns on its own. It ingests billions of images or sentences until it can produce what most resembles what it has seen. But that raw competence is not enough: left to itself, the model answers beside the point, without tact, without timing. So a second phase is added. Human raters rank its outputs, best to worst; a second model, the reward model, learns to predict those rankings; then the generator is tuned to aim for the highest score. This is the step we call RLHF, and it is what makes these systems so polite, so helpful, so strangely in agreement with us. For images the technical detail differs, and the pipeline is not always the one used for text: you filter, you fine-tune, you select as much as you reward. The analogy is structural, not a technical identity. But the principle holds: a situated preference becomes a signal to optimize.

We then say that “the human” has been injected into it. The phrase flatters, and it misleads. What is injected is not the human in general, but a particular judgment, issued at one moment, by a handful of annotators, then averaged and hardened into a function to maximize. A living gaze enters the machine; it comes back out as a target.

None of this is fate: a model can be tuned, and could be tuned otherwise. But the setting that prevails today produces a precise effect, one worth naming.

Two forces, not one, pull the output toward the center.

The first belongs to pretraining. A model produces, first, what it has seen most often: the probable, the smooth, the already-seen. It is the slope anyone who has handled these tools knows. It is against that slope that I introduce my own graphic work, made by hand: material the model has not learned to recognize.

The second force comes from alignment. The score does not pull toward the probable but toward the preferred. And the preferred is not some outside of culture: it is its densest center, because whoever evaluates has learned to see before evaluating. They reward what they recognize. Not the purely known, since boredom repels as much as chaos, but the deviation they can still bring back to the known.

The effect is already measurable, on two planes. By default, what these models give us to see is narrow, smoothed, culturally centered: that is the effect of the data. More insidious is the second plane: asked for something ugly, sad, or deliberately botched, an aligned model returns smooth beauty all the same. It does not merely prefer; it sets a threshold the instruction does not cross, and what exceeds that threshold becomes less probable, and so less visible. This is the frozen gaze: a gaze caught in a protocol, averaged, hardened into a measure, to the point where the once-living share of the viewer, their attention, their hesitation, no longer answers the image but moves to the side of its production.

One could object, rightly, that averaging taste is nothing new: the Salon jury, the market, the academy always did it. True. But three traits set this gaze apart from all that came before, and it is their conjunction that is unprecedented.

First, the jury had an opinion; the model has a score. And a score is something you optimize; by dint of optimizing the measure, you end up pursuing it for its own sake, no longer the taste it was meant to approximate.

Second, the jury did not rewrite the eye of the painters who came after. The model can: its most rewarded outputs are also its most circulated, and if they enter the data the next generation trains on, they steer it. The loop closes, and we know that a model retrained on its own productions sees its margins fade: rarity disappears, round after round, and does not return on its own.

Third, the jury sat somewhere, on a date, in a room where one could object. The model decides in silence, over millions of outputs, as a default setting. Code is law: no place, no date, no one to address.

None of these traits, taken alone, is entirely new. But no jury, no market ever combined the three at this scale, or this automatically: a taste made computable, looped back on itself, and imposed by default.

This is where the promise of these tools turns over. They are presented in a generous language: creation for everyone, the hand given back to all. And the promise is not a lie: you learn there, you search, you try things. But their engine speaks another language. A century ago two ideas of the public stood opposed: for Lippmann, the public cannot follow the complexity of the world, and it must be shaped for them; for Dewey, the public cultivates itself, through circulation and uptake. The promise of these tools is Dewey’s; their setting leans toward Lippmann: manufacture consent, shape in advance what will be received. The problem, then, is not that a score exists. It is knowing who sets it, and on what.

For nothing requires tuning the score to the mean. You can aim it at the deviation, at dissonance, at the rare voice.

But beware the false remedy. Replacing everyone’s average with the verdict of a few, experts, critics, artists, unfreezes nothing: you install another frozen gaze, better dressed. And summing the tastes of a crowd is still manufacturing an average. The problem was never who judges. It is that we reduce a thousand gazes to a single number.

Tuning otherwise is not electing better judges. It is making room for dissensus. An average score is a consensus machine: it seeks the point where the greatest number agrees, and takes that point for the truth. Dissensus begins where we refuse to reduce disagreement to that point. Keep the spread, the edges, the voices that do not fall at the center. A model can stay divided, hold several tastes at once, lean toward the margin rather than the middle.

This is where the artist’s gesture recovers its meaning. To reintroduce the ill-seen, the unpreferred, the deliberately illegible is not to “resist the machine,” nor to claim to judge better than others. It is to put back into the loop what no average would have kept, to keep alive the deviation the setting erases in silence.

The real question is not whether to keep the human in the loop, nor even which ones. It is whether we crush them into a number or leave them their spread. A frozen gaze impoverishes. A gaze kept alive, divided, responsible for its own threshold, can still open. None of this is a law of technology: the technology, precisely, exists. What is missing is a decision. And since code is law, setting what a machine rewards is not a matter for engineers: it is a political act. What is left to decide is who decides.

References

The text does not cite within its body; here is the apparatus, grouped by what it supports.

Preference alignment (RLHF). Ouyang et al., Training Language Models to Follow Instructions with Human Feedback (InstructGPT), arXiv:2203.02155 (2022).
Diversity reduction, homogenization. Kirk et al., Understanding the Effects of RLHF on LLM Generalisation and Diversity, ICLR 2024 (arXiv:2310.06452). Image models: Bianchi et al., Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale, ACM FAccT 2023 (arXiv:2211.03759); Luccioni et al., Stable Bias: Evaluating Societal Representations in Diffusion Models, NeurIPS 2023 (arXiv:2303.11408); Guo et al., Position: Universal Aesthetic Alignment Narrows Artistic Expression, arXiv:2512.11883 (2025, rev. 2026; position paper).
The measure optimized until it is emptied (Goodhart’s law). Gao, Schulman & Hilton, Scaling Laws for Reward Model Overoptimization, ICML 2023 (arXiv:2210.10760); Skalse et al., Defining and Characterizing Reward Hacking, NeurIPS 2022 (arXiv:2209.13085); Coste et al., Reward Model Ensembles Help Mitigate Overoptimization, ICLR 2024 (arXiv:2310.02743).
The recursive loop (collapse under retraining). Shumailov et al., AI Models Collapse When Trained on Recursively Generated Data, Nature 631, 755-759 (2024).
Tuning otherwise (pluralistic alignment). Sorensen et al., Position: A Roadmap to Pluralistic Alignment, ICML 2024 (arXiv:2402.05070); Kirk et al., The PRISM Alignment Dataset, NeurIPS 2024 (arXiv:2404.16019).
Public, shaping, consent. Walter Lippmann, Public Opinion (1922); John Dewey, The Public and Its Problems (1927). On “code is law”: Lawrence Lessig, Code and Other Laws of Cyberspace (1999).

Item added to your cart

The Consensus Machine

References

Country/region

Language