Recommendation systems that model taste

There is a particular kind of recommendation engine, recognisable on contact, that is in the business of finding the next thing you will click on. It is good at the job. The thing it finds is usually loud, often gimmicky, occasionally something you wish you had not seen, and most of the time something whose appeal will not survive the moment in which you encountered it. This kind of recommendation engine is everywhere, and there is a kind of customer for whom it is entirely adequate. The boutique end of independent retail is, almost without exception, not that kind of customer.

The honest description of what these engines model is engagement, not taste. The two are correlated; they are not the same; the conditions under which they diverge are exactly the conditions where boutique retail lives. A customer browsing a design boutique on a Sunday afternoon, with time, intent and discretion, is producing engagement signals that look superficially similar to the same customer doom-scrolling at midnight. The first is taste. The second is habit. A model that treats them as equivalent training data will, over time, become bad at modelling the first because the second is much louder.

Engagement is a proxy, and a bad one

Click-throughs and time-on-page are correlated with taste only when the product is genuinely good. For everything else, they reward outrage, gimmick, novelty, the bright thumbnail, the misleading copy, the well-placed glance-bait. None of these correlate with what we actually mean when we talk about taste; many of them anti-correlate. A recommendation engine that optimises for engagement will, predictably and verifiably, recommend more of what generates engagement — and the resulting distribution of outputs drifts toward the noisy end of the catalogue. We have measured this drift on a number of boutiques whose previous engagement-optimised recommendations had quietly hollowed out the right tail of their catalogue. The drift is real, it is consistent, and it is corrosive to a curated shop.

A working definition of taste

Taste is the quality of someone's decision when they have time, context and no incentive to perform. Engagement is what they do when none of those conditions hold. That is the working definition we propose; it has the advantage of being short, operational, and immediately suggestive of what a training signal would have to look like in order to respect it.

Taste is the quality of someone's decision when they have time, context and no incentive to perform. Engagement is what they do when none of those conditions hold.

It is also worth saying what taste is not. Taste is not preference revealed by purchase alone — purchases under time pressure, social pressure, or constrained alternatives are not high-signal taste data. Taste is not what the customer says publicly — public statements about taste are about the audience as much as about the object. Taste is not consistent across categories — most people's taste in tableware does not predict their taste in books. None of this is novel; what is novel is taking it seriously when designing the training pipeline.

Modelling taste

Trove Recommend trains on three signals, each chosen because it correlates better with post-purchase satisfaction than raw click-streams. The first is curator-labelled reference sets — the buyer's expert pairings and triples, edge cases included, the things she categorically refuses to put together. The second is post-purchase satisfaction, captured through a one-question follow-up two weeks after delivery: "would you recommend this piece to a friend with the same taste?" Categorical, low-friction, high-signal. The third is explicit affinity statements — wishlists, save-for-later collections, requests to the buyer. Scarce but precise.

What we deliberately under-weight is the click log. Not zero. Clicks contain information; they tell us what catches the eye, which is sometimes useful and occasionally important. But the model is conditioned to treat click signal as a hypothesis, not a conclusion. A piece that is clicked frequently but generates poor post-purchase satisfaction is downweighted; a piece that is rarely clicked but reliably loved by the small audience that finds it is amplified.

Operational impact

The metrics move in directions that, for an operator with a Monday-morning dashboard habit, can be alarming. Click-through rate drops, modestly, in the first month. Average session length drops, slightly, in the first two months. These are the metrics most heavily optimised against by the conventional recommendation industry; their decline can feel, briefly, like the system is failing. It is not. What is happening is that the model is producing outputs that the customer either acts on or skips, rather than outputs the customer reflexively clicks and then bounces from.

Other metrics move in the right direction, and they move further. Units per transaction lifts. Average order value lifts. Return rate drops — sometimes by seven or eight percentage points across the relevant categories — because the customer is buying things they wanted, not things they were nudged toward. Repeat purchase rate at ninety days lifts. Twelve-month lifetime value lifts substantially. None of these are vanity metrics; all of them are what an operator running a boutique into the second decade actually cares about.

What you give up

The dopamine charts. Engagement metrics go down before they go up, and operators who have been trained, over the past decade, to read engagement metrics as the proxy for business health will find the first six weeks of an atelier slot disorienting. We say so up front during the application call. The boutiques who can stomach the early dip see the curve invert by month four; the ones who panic and revert to engagement-optimised recommendations see neither curve, because we politely ask them to leave the programme. The discipline is not optional and we have learned that the half-discipline produces the worst of both worlds.

What you get

A recommendation system that behaves the way a good colleague would behave. It remembers context. It explains its reasoning when asked. It is willing to be quietly wrong in service of being usefully right. It does not gamify the customer. It does not gamify the operator. And — over the year that matters — it produces a boutique that customers return to with a different kind of confidence than the one that engagement-optimised commerce ever produced. That confidence is the asset; it is what we are in the business of building.

The Taste-Modelling Whitepaper · v2.1 The full methodology — training signals, embedding architecture, evaluation discipline, calibration against post-purchase satisfaction — is published in the Library. The whitepaper is read, not skimmed; it assumes the reader has done a recommendation-systems integration before. Pilot data is anonymised but specific.