Preprints Aren’t The Problem — WE Are The Problem

The COVID Files #2

The above was the first thing I stumbled over which predicted the present donnybrook over the now notorious preprint on seroprevalence in Santa Clara county.

It feels like it started in about 2017. It was three weeks ago.

That tweetlethread is about as good of an example of serious and detailed ‘preprint / community peer review’ as you’ll ever see, and the sort of thing that provides the activation energy of my suspicions. Before I saw it, I was operating under the fairly straightforward assumption that this preprint just had regular old problems, and probably soluble ones.

Two problems, specifically:

(1) the population in the study is self-selected, and

(2) it’s very very hard to estimate population prevalence of something rare without a disgustingly accurate test — even a tiny false positive rate will throw a spanner in your gears.

Those are regular problems. They’re a little more common that the garden-variety ones (multiple comparisons, for instance, or pointlessly dichotomising variables) but they get around.

When you add “the maths is wrong and/or unreported and/or uninspected” to them… that makes it worse. And, as of the other night, throwing “the analytical quality of the diagnostic test used was either something the authors either didn’t care about, or actively ignored while it was being investigated by their own team” into the mix makes it worse still.

But: it’s almost boring by now. Isn’t our new world fun? It just happened, and it’s passe. So my two cents are useless copper on a huge pile of loose change at this point, and I don’t want to re-litigate that paper.

What I do want to do: cast this episode as just one entry in an increasingly long ledger of high-profile and dramatically undercooked scientific papers (and it really is dealer’s choice for which one you pick at this point) that have made a good few people question the role of preprints in scientific life — a position which, as someone who wrote an article literally called Why I Love Preprints, I find totally understandable and justified.

Because this is now a serious problem.

Essentially, I’d hope this can serve as a reminder that the idea of releasing work previous to ‘formal’ publication isn’t the problem — it’s us. The problem is the user, not the tool.

Preprints had, past tense, a well-understood role in scientific life. They had a generally well-observed set of conventions around what it should mean to ‘preprint something’. We had, past tense, an understanding.

Then, of course, the Plague.

Taking Responsibility

What I said a while back about the role that researchers need to play after they release a preprint stands up well enough:

(1) Give realistic statements of the limitations within the paper. You should be well aware of what can go wrong with research. This should be trivial to include.
(2) When you make public comments, attempt to give perspective. Prepare for this because you know the interest will be immediate and extreme. Make a decision about what is and isn’t responsible to represent simply to the media and general public.
(3) Deliberately engage experts in the appropriate areas to assess your information publicly. If the work needs to be read by experts, find them yourself. In particular, put your work in front of people who might disagree with it.
(4) Admit that criticism of your work exists and then engage with it. You release a paper in advance of formal publication for DISCUSSION. Well, if that’s the case, get into the weeds and discuss it.
(5) Update your pre-print! It isn’t published yet. You can do whatever you like to it.

Especially in the middle of a large crisis.

Without getting mired in the details, the sort of immediate and reactive response described above was always supposed to be part of the pre-publication process.

A preprint was supposed to be a vehicle for discussion, not something you could publicise as hard and immediately as possible. In this Santa Clara business, if I’ve understood the timeline correctly, the authors released a preprint in advance of waiting for crucial analytical certainty, and then immediately jumped to press conferences and media appearances. They’re hardly alone in that respect. The preprint-followed-by-immediate-formal-demand-for-attention is a disgusting new normal.

You can find all of these authors trying to ride two horses with one arse, in that there’s a hugely disingenuous response to criticism: “oh, yes, those crucial errors — it’s good that people are pointing them out now, and we’re happy to update the manuscript, because it’s a preprint. Give us a week or two. Look, Mum, I’m doing Open Science!

In a normal world, that would be OK. In the overheated environment of the present moment, the horse hasn’t already bolted as much as hitchhiked interstate under an assumed identity to pursue a career as a mule. The eventual correction is more akin to those you see in a newspaper after a scandalous story is quietly corrected for crucial details a week later.

A week later:

Regret, perhaps, but you already sold the newspaper.

Regardless of whether or not I have some archaic or utopian idea about responsibility here, I can find zero historical precedent for formal academic research that says pre-publication documents should form an immediate pathway to press conferences, blanket claims, and enormous media coverage. That was never on the leaflet, neither the strong self-promotion nor the colossal attention. That isn’t science, it’s your ego wearing a small hat with SCIENCE! written on it.

Of course, it isn’t simply this timeline-to-publication business that’s experiencing upheaval right now. We’re also seeing:

(1) immediate and strong push-back, within hours or days, against inflated claims by a critical community of scientists, doctors, and statisticians who all very much have a terminal case of the shits with this present state of affairs;

(2) as above, the details of (1) not mattering much after a news cycle or two have been missed — coverage begins even before someone can open the supplementary material and start checking the receipts, and that coverage immediately descends into tiresome and dangerous political expedience;

(3) journals offering extraordinarily rapid peer review (sometimes under what smell to me like dubious circumstances);

(4) a few instances of even the basic release of study details being hidden, just headlines in the absence of the slightest shred of confirmation that the work even exists;

(5) partial, incomplete or absent datasets and code accompanying all of the above.

[A side note while I’m here, and one I cannot overemphasize — if you want to be believed in a hurry, the easiest way is to release your god. damned. data.

If you went to the trouble of getting rapid ethics to do these studies, the excuse of ‘well, back in the day we didn’t anticipate releasing the data so you can’t have it’ is moribund. Figure out how to anonymise it, talk to your ethics committee, get your participants the right kind of waiver, figure out how to simulate it, whatever you like… and release it.

And, if harsh questions are asked, any fidgy-widgy business about why you can’t retrieve the right numbers somehow … expect overwhelming criticism. Seriously, prepare for it. Buy a pair of brown trousers now for when the backlash makes you shit yourself.

The data is a lot more valuable than what you write about it, and if you don’t pony it up if it becomes important, people will try to extract it from you with forceps.

Listen to Hungry Santa. He is wise and terrible.]

I’m actually inclined to be a little bit generous with some of this collective derailing, as a kind of category mistake — science isn’t designed to run this fast. We’re all Scotty, well aware that the engine shouldn’t go at 104% by definition, and that at any point the cowling is going to fly off, but the demented demands of Captain Corona-Kirk have been issued.

Under normal circumstances, scientists unused to rapid publishing might think that updating a manuscript in a week or two is incredibly rapid (and by almost all of publication history, it is).

But, seriously, read the room. Right now, releasing information in a preprint (or even just knocking out a press conference to announce how clever you are) is the shot heard around the world. In many ways, any new information starts a clock on an immediate and very detailed and very loud examination of what you did and also the immediate entry of your ideas into public life, with all the component distortions and vested interests that process comes with.

I estimate you’ve got about 2 news cycles now before your ‘tenuous unpublished result’ is basically a meme. That is, about 16 hours.

If I wrote a preprint about The Plague, and I’ve declined two opportunities to do so, I’d set aside the next week for checking the damned thing in real time, recutting the analysis in real time, talking to journalists about updates in real time. I’d place myself at the dead center of that narrative. I’d be ready with my own megaphone in case I heard my words coming out backwards through someone else’s.

It sounds exhausting, to be honest. It makes you want to say All I Care About Is Science, and jeté back to the old ivory tower.

But the rules of the game are changing as we play it.

This isn’t fast science of a few months ago, it’s some kind of blinding hyper-science. It exists and then it cuts straight to the heart of public discourse and public policy.

And I don’t think people who cut their teeth in a culture which could stomach waiting six months for peer review are quite ready for it. To hyper-connectivity, we’ve added hyper-attention and high stakes. It’s like learning poker from the maths, and then suddenly facing Daniel Negreanu talking smack while someone pokes a live TV camera in your grill.

I have some compassion for the collective authors of hasty preprints. It’s hard not to. What we, the assorted miserable critics of the internet, might think is the inability for some authors to react quick enough to questions about their work could be the byproduct of them already working a 12 hour day to keep a lid on the Pandora’s Box they managed to kick open. Everything boiling hot. Handling all your own press. Getting shotgunned with additional responsibilities.

And then someone like me shows up.

“Why did you publish that? I sent you an email two days ago! You can’t add, you mad div. You are clearly wrong! AND I hate your tie!”

It’s all going to sound a bit thin if that’s one email out of 140.

In short, the Plague manages to set the need for information as soon as possible against much higher stakes for accuracy, completeness and humility than usual. Any consequential paper being released early is a high-wire act, balancing a lot of attention against the potential consequences of a rather unpleasant plummet. And the chances of any given paper becoming consequential are MUCH higher than usual.

All you are left with right now is hoping that the authors are right. Hoping that the hype turns out to be real, or even semi-solid, instead of useless or actively misleading.

But hope is, and has always been, a mug’s game. This great diseased rush, towards the limelight, the attention, the positioning, the money… we can see that a new set of levers have appeared, and people are trying to grab them even though they’re red hot.

Obviously, time is of the essence— we need information, a solution, and a resolution. Science will do all that, eventually. But, it will bring along with it the Four Horsemen, and also a desperate and thirsty horde of grubby grandstanders, who want attention in the same way I want a beer right now after writing this.

I wish I could finish with something self-righteous, like ‘I thought we were better than this’, but it would be a black lie. I didn’t think that. But having presupposed it doesn’t make me delighted to see a dilution in the accepted standards of evidence, and a serious warp in the fairly straight continuum we agreed could be walked with pre-publication research. I’d rather be wrong.

So, no grand sign-off here. If science had a collective face right now, I would flick it in the nose.