And, Yet, The Soup Moves

The stupid coda to a stupid story

James Heathers
10 min readSep 11, 2020

September ‘20: I wrote this god knows how long ago and never published it. Presumably after this dreadful, glib article was published, which was … mid ’19 I believe.

As I’ve said before, this happens frequently. I have about a paperback’s worth of unpublished blog posts from last year alone.

I write primarily to relieve pressure, when I am compelled. Catharsis often comes from simply writing down whatever is pressing on your temples. In the leeside of such indulgences, a judgement must be made — should I assert that people spend their time reading this?

Often, they should not. Really. I know my limitations.

Reasons to mothball something differ — some pieces are deemed too unpleasant to exist and too much of an arseache to edit into something palatable for human consumption, some are finished then instantly bore me, some are contemporaneous then abandoned for a period of time sufficient to make them desperately outdated…

… but the most common reason is because I judge the piece to be kicking a dead horse. I have a baseline of distaste for pile-ons and pitchforks. People are smart, mobs are dumb. I’d often rather not join ones I agree with entirely. That was the central reason here.

And one day in January, only some months but also a lifetime ago, in the middle of my hard-won holiday, falling from the clouds with all the tone-deaf jangling of a bag of copper pots flung from a low-orbit satellite, came this:

And I said to myself ‘Sheeeit I should really get That One Blog Post out when I have time’.

Finally, the time. Frankly, only because ‘soup’ came up in a conversation. So I have unretired this piece. It may be of some value to someone, because it includes a fresh series of observations about a never-before-thugged-up paper.

Which is, of course, dreadful.

NOTE: some of the numbers below are probably out of date. I can however assure you they were accurate when they were written down.



I was perfectly happy to forget about Former Professor Brian Wansink.

It’s done. The points have been made, the arguments argued, the stones thrown. I’m not relitigating the literal years of work it took to provoke a proper discussion. If you’re new to this whole issue, the whole donnybrook can be summed up with this:

*Mic drop*

I would add only two things to that.

(1) that statement was nine months ago, I don’t know how long their ‘ongoing review’ takes, but given the length of the above investigation (and the conveniently unmentioned fact that this was the SECOND investigation, and that the initial investigation concluded that everything was fine, thank you very much), I expect a full accounting around 2037.

(2) Wansink has FORTY entries in the RetractionWatch database. 18 of them are retractions. There are only 46 entries total for Cornell. And it’s a colossal research university with more spare money than a Russian oil baron’s thick son. Some other chancer there should have had the money to fail spectacularly at scientific honesty by now.

Of course, sleeping dogs don’t always lie.

They often get up, bark, and start happily crashing through their surroundings — breaking the china, widdling on the carpet, and biting the postman.

So I had a tremendous feeling of deep internal fatigue (not an unfamiliar experience for me, I’m afraid) when I read this: a partial rehabilitation of all of this silliness, under the overriding principle of ‘hey, no hard feelings!’

Sorry. Hard feelings are part of hard tasks.

I don’t have time to organise everything in a partially coherent narrative, partly because this annoys me too deeply to do so, partly because that takes the kind of time that you can put aside only when you’re paid to write.

So, I’ve left alone the ridiculous description of the p-values, and the sense of false balance, and the excuse that “hey I’m helping people” is a good enough reason to produce research too inaccurate to exist over a period of decades.

Let’s just talk about the evidence.

A famous experiment, yes, and an inexplicably flawed one. It has so many inconsistencies it could moonlight as Nixon’s legal defense.

I must admit: when it comes to the ongoing prosecution of this paper, I never did anything. I just left it, assuming that it might sort itself out.

Usually, this never happens. But suppose I thought that given:

[A] the proactive leadership shown by the JAMA group might act as an example (they demanded an explanation and a full data-driven accounting for all Wansink’s work published under their masthead — and upon not receiving it, retracted the lot… this is a rare and beautiful Bird of Paradise in the integrity world), and

[B] the fact that this is his most recognisable study, and

[C] that the problems with this study, which are substantial and ridiculous, were pointed out, by me, at length, in public, 2 years ago

… that I might not have to follow the usual routine of chasing down a recalcitrant editor 9 months.

As I said: I was done with this.

Of course, absolutely nothing happened and the study remains intact. Personally, I’m at the point where a deep sense of abiding cynicism floods through me, toes to haircut, at the thought of trying to stamp yet another coda to this sorry six-sided Sisyphean shitfight. Moreso now the dust has so firmly settled and it all seems passe.

Why is easy to explain: the process of error detection itself is quite engaging. Those of you familiar with talks I’ve given will recognise the following: I love puzzles.

At any given point in time, my phone has Words With Friends or Scrabble, a word puzzle game, a binary puzzle game, KenKen, Kakuro, Rullo (a cross-number puzzle like a magic square that I’m quite taken with), and other assorted silliness. I could beat my family at Trivial Pursuit when I was about 8. I love pub trivia (it combines pubs and trivia), even though I think Cardi B is what an accountant wears on Tuesday.

However, having solved the ‘puzzle’ that any given paper represents, spending a year trying to convince a hostile third party that you have a solution markedly reduces the level of engagement you experience.

This is why the classical, and most effective, response to hard questions about research accuracy is to be totally silent and non-cooperative. When you stretch every process to its time limit and refuse to engage, you prevent other people in the process from being collegial — they are incapable of ‘involving you in the process’, because you don’t acknowledge it exists. More on this some other time.

But that’s all old news.

Let’s get to the new stuff.

Shall we do a brand-new never-before-analysed Wansink paper?


This caught my attention. I don’t know this paper. Uncle Brian wrote so many. But let’s take a quick look.


So, people ate from 0 (no crackers) to 400 calories, and this sub-group of overweight people (n=15) ate Mean =383 cals, SD = 159?

THIS IS IMPOSSIBLE, unless they were eating other people’s crackers. The maximum possible SD is ~103. Thank you, SPRITE.

It’s not totally unrealistic to imagine that people will pinch foodstuffs off others (I mean, try not finishing your Old Fashioned in front of me and not hearing ‘are you going to drink that?’) but it requires the fact that they tracked *who ate what*, where they were sitting, and probably a video to record some kind of … cracker transfer ratio.

This is never mentioned, and it’s implied that the calorie intake was calculated just by # of crackers per person after the experiment (“After watching the 22‐min show, participants completed questionnaires, and the remaining crackers were counted to calculate their caloric intake.”)

So something’s wrong.

You don’t even need basic error detection tools to determine this, just look at the graph provided. This is mean (*SEM*) for the package vs. BMI group for all four groups. As there is a hard limit at 400kcals, is there nothing suspicious about the large packet BMI > 25 group??

Look at that error bar there, big as Yeltsin’s liver, sitting up well above the margin of possibility. I will bet anyone a large sum of money that other errors are present, because the above took about NINETY SECONDS.

Even the descriptive statistics feel odd, but I’m not running a layered SPRITE (i.e. hacking together realistic solutions for the BMI, height, and mass) because I want to scream at this point.

Do you have any idea how crummy this is, and how easy it is to get it right?

And in case you think I’m relitigating old points, I invite you to inspect Tim’s heroically cultivated list of FIFTY papers with mistakes and inconsistencies here — this paper isn’t on it.

Guess that puts us at 51.

Now, let’s look at some of the more amusing bits and pieces within the article that provoked this.


If you missed it the first time, it’s here.



You know what? This is perfectly correct.

It’s also a vanishingly uncommon scenario, because (a) journals hate retracting anything under any circumstances, and (b) it’s really small beer on the scale of academic sins. But it IS absolutely possible.

The only problem is: that’s not what happened here.

And additionally, if that IS what happened, it’s because the ‘original coding sheets’ were requested SUBSEQUENT TO SERIOUS CONCERNS BEING RAISED ABOUT THE DATA.

Let me draw a tortured analogy.

Imagine a truck loaded with radioactive waste shows up at a containment site. The dosimeters at the gate are going nuts (RADIATION DETECTED! BEEP BLOOP!), and the cladding on the truck appears to be stuck on with duct tape. The waste in the back, instead of being organised into lined drums, is stored in a series of used plastic bags from a local hardware store, and the driver has one eye, and he is driving while reading a copy of Horse And Hound, in which he has drawn pants on all the horses in Sharpie.


Not wanting to get into an altercation with the driver, the gate guard asks for the driver’s heavy vehicle license — he’s allowed to ask for whatever he wants, it’s in the job description of guarding something.

Normally, it would never be an issue. It’s a company truck, with company plates, arriving at a pre-arranged time.

“Don’t have it with me” grunts the driver, and just keeping scribbling on the horses, adding weskits and spats to their existing modesty-appropriate outfits.

“Well”, says the now-sweating guard, who is now seriously worried about his personal rad count, “You can’t come in. I need to see your driver’s license.”

“BLRAGH!” yells the driver, adding the buttons to the front of a particularly ill-fitting waistcoat attached to a particularly immodest stallion.

“No license, no entry” says the guard, slamming down the safety boom and running off to take two long, separate, careful decontamination showers. When he’s as clean as he can be, he writes in the logbook: “Delivery rejected — no driver’s license.”

Tortured analogy or not, the point is: WHY you ask for bona fides is determined by other factors, and not random. The fact that you left your license in your other overalls is immaterial in and of itself. Your toxicity is hanging out.

Anyway. Horse pants jokes over.



I have never seen this argument made anywhere before.

Imagine an architect saying “my buildings in Shanghai are just some of the more than 200 I have built which haven’t fallen down into a shower of glass, steel, and body parts”. Would you eat at the rooftop restaurant this cat designed?

You wouldn’t even walk past the front door, in case you were hit by falling sheets of cladding.

If you write about science, this sentence is the most extraordinary abrogation of basic responsibility. There is no credence given to the idea that generating 18 retractions and 22 corrections* might provide some kind of generalised unreliability to the research in question. There is no unlucky binary, where you did 200 excellent studies and then unaccountably also managed 40 terrible ones.

One final quote before I lose it completely…



If this is the case, why did you spend 30 odd years trying to publish in ‘great journals’, winning awards, and garnering citations?

Do you have any idea how much time and money was spent by thousands upon thousands of researchers giving these ideas credence because they were scientifically accurate and not because you thought they might work for someone?

You do NOT get to negate the enormous body of total flaming stinking fucking rubbish produced here by saying ‘I meant well and it helps people’.

This is the literally the academic equivalent of saying ‘I had my fingers crossed!”

Why didn’t you just do market research for Nabisco?

Go away.

(*) (Or is it more than that now? How does that count the paper that was retracted twice? I don’t even know how many.)