When it comes to predicting people’s preferences, it pays to consider “the power of three”
In his 1927 paper, “A law of comparative judgment,” the American psychologist L. L. Thurstone proposed that when people select one option among multiple alternatives, they are picking the one that has the highest value to them, even though they cannot assign a particular number to that choice.
Thurstone was a pioneer of “psychometrics” — a field built upon the premise that mental processes, which we cannot see, can nevertheless be measured and quantified. His 1927 paper laid the groundwork for what are now called random utility models, which provide a mathematical framework for describing human preferences — information that can be relied upon, in turn, to make predictions about various hypothetical situations.
Random utility models (RUMs) are so named because they assess the “utility,” or benefit, that can be obtained from a given choice — such as deciding which book to read first among the stack of novels you brought back from the library. “These models are inherently random,” explains Gabriele Farina, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science (EECS) and principal investigator at the Laboratory for Information and Decision Systems (LIDS), “because people are different. Everyone has their own preferences, and even those preferences can vary from time to time.” For example, someone who normally picks coffee over tea in the morning, and prefers tea after dinner, may, upon occasion, mix up that order entirely.
RUMs, to be sure, are frequently used within government and industry in situations of far greater consequence than the selection of a hot (or iced) beverage. The models routinely facilitate predictions regarding what people will elect to do in so-called counterfactual (“what-if”) scenarios such as: How will they get to work or school if a major thoroughfare is shut down for construction? What routes and modes of transport will they take? Or, if a city suddenly receives a windfall of $20 million, how should those funds be disbursed to maximize the common good?
Given that RUMs have been with us for almost 100 years, growing in sophistication over time, one might imagine that, at this stage, there would be little room for improvement. That, however, is not the case.
A paper presented in April at the International Conference on Learning Representations in Rio de Janeiro, Brazil, uncovered basic facts that show there is much more to be gleaned from these models than had traditionally been supposed. The paper was authored by Yeshwanth Cherapanamjeri, a former MIT postdoc now based at Nanyang Technological University in Singapore; Farina, also core faculty in MIT’s Operations Research Center (ORC); Constantinos Daskalakis, the Avanessians Professor of Computer Science at MIT and a member of MIT's Computer Science and Artificial Intelligence Laboratory; and Sobhan Mohammadpour, an MIT PhD student in computer science based at LIDS and EECS.
The group’s findings stem, in part, from a deficiency in the way RUMs are commonly estimated in practice, which has persisted since the days of Thurstone. The data upon which the models are estimated have been largely drawn from so-called pairwise-comparisons: In a choice between items A and B — whether it pertains to movies on Netflix, competing products on Amazon.com, news stories posted on Google, and so forth — which one would you pick? One reason this approach has been so pervasive, explains Daskalakis, is that “assigning a precise numerical score, such as 4.37, to the benefit you get from a single item is very hard. Whereas comparing two things, and deciding which one you like better, is cognitively much easier to do.” But therein lies the rub, he adds. “With this way of assessing people’s preferences, looking at just two things at a time, it is impossible to find correlations between the numerous choices.”
The standard way of applying RUMs assumes that the utilities derived from A and B are independent, but they may, in fact, be linked, and that would be important to know. If someone campaigning for elective office finds out that a potential voter favors gun control, for instance, there is a reasonable chance that same person also favors government-sponsored child care. Similarly, a fan of independent movies might also be partial to foreign films, but less enthusiastic about Hollywood action blockbusters. “If a digital platform has a blind eye to the existence of such correlations, it will not be able to estimate preferences very accurately,” Daskalakis notes. “And if Netflix regularly shows you an assortment of movies you don’t care about, you might sign off and cancel your subscription.”
The MIT team proved that it is impossible to get information about correlations from two-way comparisons alone. Correlations can be discerned, however, when large numbers of people rate three alternatives in their order of preference. The same information can also be obtained from a combination of best-of-three and best-of-two choices. In practice, Mohammadpour explains, “you would get a bunch of people to rank three items. You could then utilize the method we developed for merging those individual results into one big model that can provide us with the big picture.”
Their research effort, according to Farina, is focused on the computational side of RUMs, devising algorithms that can extract preference information and figuring out how much data is needed to do so or, equivalently, how many experiments need to be run. The good news, he says, is that efficient algorithms are, indeed, possible for this purpose. The requisite number of experiments does not grow exponentially with the number of items in the catalog or database that’s under review.
“This paper provides a crucial breakthrough,” comments Emma Frejinger, a computer scientist at the University of Montreal. “It mathematically proves why traditional data collection fails and demonstrates that simply asking users for their best-of-three [choices] unlocks the ability to accurately train these powerful models. This finding provides a highly practical roadmap for collecting better data to drive more accurate optimizations.”
“Building utility models is going to remain a very active area,” Daskalakis insists. “Just as RUMs have been critical to the internet economy since the late 1990s, they are, and will remain to be, critical to the alignment of AI models going forward.” More importantly, he adds, “RUMs play a central role in the commercial viability and usefulness of large language models [LLMs].” During the training period, people are typically asked to rank the various candidate outputs of these LLMs, from which the models can gain a better sense as to the kind of text — in terms of tone, style, and content — that is preferred.
Given that we’re constantly “besieged with a vast sea of options in so many different domains,” Daskalakis says, “you cannot possibly ask people to communicate all their personal preferences for all possible scenarios. So what you can do instead is build a model that predicts what people think about the different possible outcomes. And you have to keep improving and updating your model in an iterative process until, hopefully, you can make good predictions.”
📌 Kaynak: MIT News - AI