The Joy Of Cooking The Books

James Heathers
12 min readMar 23, 2018

--

On the oddest paper ever thugged.

Note: some of the below is covered in Helen Rosner’s New Yorker column here. She’s a better writer than me… but this has graphs. And swearies.

EDIT: Dec. 5th — This paper is now retracted.

Have you ever spent an hour trying to figure out average poultry yield from a chicken in the 1930s?

Me neither. Until the other day.

Here’s the story: in the wake of several stories about problems in the research of Prof. Brian Wansink, a new and totally unexpected angle arose.

Context: the Joy of Cooking is a cookbook. It is also an American institution, a piece of living culinary history updated continuously since the 1930s, in its 8th Edition. Thanks to the fact that it is a single consistently curated and very large text, it’s responsible for what the definition of many dishes IS.

Let me put it this way — Daniel Gritzer is the managing culinary director of SeriousEats.com, far and away the most useful website about food on the internet.

(Sorry, literally all the other ones, it’s true. If I need tips before starting a recipe, I check it first.)

In his list last year of all the cookbooks which are mandatory, many of the usual suspects are there (Larousse, Child, Pepin, McGee, Peterson)… and so is Joy. I don’t know their sales figures, but let me put it this way: a million people in this country will wake up tomorrow morning and make waffles using ‘the classic recipe’ — and guess where that’s from.

However, you don’t get to be a dog without fleas.

Oh dear.

In all the previous data thuggin’ and number muggin’, we never looked at this letter. Just not enough time in the day, I expect. Anyway, there isn’t much content to it overall, being only a few hundred words long, and there also aren’t many numbers we could have checked.

That’s not to say this letter was insignificant — it (Annals of Internal Medicine; Wansink and Payne, 2009) used the ongoing editions of Joy as a window into America’s decline into the Seventh Circle of Dante’s Caloric Hell. The conclusion was, well, let’s not put words into anyone’s mouth:

Naturally, this has a nice splashy hook and is the genesis of the previous accompanying infographic: “I have 44% more calories per serving than you!”

So, you know, lesson learned, ‘warp some for later’… yes, ‘warp’. Presumably Mr. Sulu is involved.

“Warp some for later? Oh myyyyyy.”

There are a few reasons this matters.

  • This letter was published in a real journal, not one of the often-forgettable journals we’ve seen in this sorry saga, like The Lesser Spotted Journal of Marketing Up A Hallway By A River Near A Creek which will publish any old dreck. Annals is not run by a cabal of dim white men in cheap suits who think calculus is a Futurama character. Instead, it’s one of the best known and most highly cited journals in general medicine.
  • This got props. It’s a neat result, and it’s well focused. Have conclusion, will travel. Naturally, news cycles (bad pun coming) ate it up. It went ‘virally big time’.
  • The target is a real actual publication. The Joy of Cooking is an ongoing project, not a publication whose author is long dead or forgotten. They have circulation and sales figures and distributors and a reputation which they want to protect.

However, instead of crying over spilt milk (heh), Joy did what I’d recommend anyone do in this situation, but few people get around to — they did the sums themselves.

Then, they sent the sums to us, and now I’ve checked them. Out of the 36 recipes (18 recipes * 2 editions) involved, I found ONE inconsistency between Joy’s dead reckoning and mine. And I’m not even sure about it. Basically, as far as I can tell, they carefully curated the numbers involved with accuracy that I find pleasing and appropriate (and, remember, that my favourite euphemism is accuracy fetishist).

I should add here: I’ve checked these numbers entirely by myself without any further input from Joy. This needed to be impartial.

Spatulas to attention, ladies and gentlemen, this ain’t pretty.

Errors in conception

Tiny sample size

The letter says that only 18 recipes have been carried through from 1936 to the present edition of the book, so they were used for analysis.

I don’t know if this is true. I doubt it, but I don’t know. However, it is also a totally unnecessary criterion for inclusion. The difference between any two points from any two of the seven available cookbooks is relevant. Smushing the analysis down with this arbitrary criteria leaves an astronomical amount of data on the table.

It’s also a lazy way to construct a dataset, because Joy sent me more than 100 recipes congruent from 1936 to 2006. I do not have the weeks required to analyse all of them, nor to fill in all the intermediate recipes from the editions published in between, but suffice to say, I’d hazard a guess that the recipes involved were NOT chosen at random.

We’ll come back to this later, because it’s absolutely center-stage waving-a-flag crucial.

Serving sizes

If I make a nutella sandwich fried in butter dusted with extra sugar, is it obesogenic? Well, not if you only eat 1/16th of it. Serving size is everything in determining how many calories are in something.

Example: a summer salad for four people has more calories than a quarter of a Mars Bar. Obviously they’re not even slightly equivalent in other ways — one you can eat without even thinking about it, the other would be a green odyssey where you’d get cramps in your fork hand, the kind of thing Matt Stonie would attempt.

So serving size matters a great deal. The only problem was: a lot of recipes don’t have one.

And, somehow, the letter managed to calculate them.

Here’s 1936 Gumbo.

This does not list a serving size, just how much yield you’re getting —something which is more useful than a serving size, because not everyone eats the same amount! — “about 14 cupfuls”.

To produce this analysis, serving sizes just… well, appeared.

There is no other word for it. Serving size was not listed, nor implied for just over HALF the recipes. This is because either:

(a) one wasn’t listed, and the recipe outcome was listed either in totality (i.e. ‘makes one pie’) or as a volumetric measurement (i.e. ‘makes ten cups’). This is gumbo, chowder, pie, cake.

(b) the recipe made multiple units of something (i.e. ‘makes 48 biscuits’). This is biscuits, muffins, brownies, cookies, cornbread, pudding.

Not cool. All you have to do is magically assume that the modern version has fewer serving sizes (hence more cals/serving), and there’s your result right there.

Things change!

Picture this in your shiny mind — “roast beef”.

What does it look like? The answer is: anyway you like.

On the left is a chateaubriand (a small piece of center-cut beef tenderloin).

On the right is, well, it’s a whole goddamn roast steer.

Both of these are roast beef — if either was described as such, it would be no problem. However, it’s pretty obvious that while one is a pound of lean beef, the other is about 300 pounds of skin-on fatty-as-hell-or-it’ll-dry-out beef (note: that’s a guess, it looks like a fairly small steer).

Still all well and good. UNTIL you come along and make a claim that one is in any way equivalent to the other just because they have the same name.

Did that happen here? YES.

Let’s stick with the recipe from above: gumbo. And for context we’ll turn to — where else? — SeriousEats.

All bets are indeed off. 1936 gumbo is a chicken stew, thickened with okra (no flour and no filé). This is the historic version — have a look in the article linked above, and compare the 19th century recipe to the Joy 1936 recipe, they’re very similar. In the recipe above, it’s described as soup. It literally says Season the soup with: Salt, Paprika.

2006 gumbo, however, is a completely different kettle of chicken (you didn’t think I’d make it through this post without puns, did you?)

It’s a recipe you’d probably recognise as Louisiana gumbo, thickened with brown roux (which is flour and oil) and flavoured with Andouille (a fatty, heavily seasoned, smoked pork sausage). It’s definitely not a soup, it’s closer to a stew, and it’s almost always served over rice, not by itself in a bowl.

(What happened in the meantime that prompted the change? I’d be speculating, but I’d say it was things like Prudhomme’s Louisiana Kitchen. Prudhomme was an enormously popular chef who did more than anyone else to bring Southern food to the world in the 70s and 80s. His 1983 recipe for gumbo is very, very similar to the 2006 Joy version).

Basically, the ’36 and ’06 recipes share a name and little else. Pretending they’re the same and then decrying the increase in caloric content is either dissembling or a massive oversight.

Now, we could stop at this point. We have a sample that was, for whatever reason, deliberately underpowered. We have serving sizes that appear out of thin air. And we have the failure to appreciate that not all recipes are created equal. As far as I’m concerned, this study is already invalid in its conception.

But let’s punch the numbers anyway.

Errors in Calculation

The letter says total calorie content increased in 14 out of 18 recipes. I got the same. Will wonders never cease.

The letter says total calorie content increased from 2124 cal in 1936 to 3052 cal in 2006.

This is a 44% increase.

I got from 1987 to 2403 over the same period (note: these figures were mugged up in the official USDA-approved software by Joy. I checked them manually, and they were correct.)

This is a 21% increase.

With a simple paired t-test, difference over time wallows around p=0.052, and that’s including the Great Gumbo Disaster outlined above. In the absence of that datapoint, it’s p=0.095.

As the same software was used to make the same estimations, I have absolutely no idea how these numbers are so far apart. Some of it is most likely due to the assumptions that are inherent in things like “6 to 8 servings” — I would hazard a guess that, wherever possible, the letter used whatever maximized the result — and the optional additions to any recipe, which may also have been added ‘strategically’. I’ll reserve my judgement on the ‘complete inability to handle data responsibly’ aspect.

The letter says the calories per serving increased from 268 cal to 437 cal, and that included 17 out of 18 recipes. For this, I got nothing at all, because I’m not making up the goddamn serving sizes, that is not how science works. If we already have assumptions or guesstimations built into the recipes, adding another layer of assumptions isn’t going to help.

And now, we get to the real meat of proceedings — all the stuff that went wrong.

Problems

  • Everything really was superbly heterogenous. I’m not sure how well simply means and SDs really capture this grab-bag of changes. Here’s a graph of all things that don’t rely on bogus serving sizes; [a] the differences in calories over time, and [b] the difference in total calorie percentage change over time, [c] the difference in calories per gram (assuming, foolishly, no change in density over time which is insane primarily because of fluid loss). The Great Gumbo Disaster is in red.
  • B is probably the panel of choice in the above. You can see some very substantial increases in overall calories, but only for a few recipes. One is chowder, which replaces water with milk (+86% increase), the other is chicken a la king, which is affected by the fact that cream is optional in the ’36 recipe (and therefore not included) but its equivalent is mandatory (and therefore included) in the ’06 recipe (+134% increase). The result in question hinges on this kind of slipperiness.
  • If any number or calculation was unclear at any point, I did what I’d do if I was trying to run the study, which is the Steel Man against the hypothesis of interest. That is, I used the lower estimates for all recipes where possible, for a couple of reasons. (1) you’re supposed to guess conservatively, not prop your result up when you have the freedom to do so, (2) the lower estimates seemed to be more congruent between recipes. In a full-size study, though, there are obviously better ways of doing this. One would be entering the data as a range, and then sampling from it at random. Another would be figuring out what the ‘typical’ version of a recipe consisted of by consulting other sources. So it goes.
  • Some of the serving sizes are meaningless. Like saying a 4-egg omelet has 4 servings… you touch my omelet, I’ll jab you with a fork.
  • There are a few places where you CAN see exactly what was supposed to be the thesis of this letter bubbling through — for instance, 2-inch muffins from the 30s morphed into monster muffins we expect these days. However, this DOESN’T have a bearing on serving size, because how many tiny muffins are in a serving? My gran used to make butterfly cakes out of tiny 2" muffins (using a recipe from Joy, actually!) and because she was a generous woman, and I was a tubby little bastard, I used to get 4. Is that ‘a serving’?
Nostalgia.
  • Calculating total calories from recipes is a disaster, because a dozen measurement issues get in the way. The conversion from volume to mass. The thermic effect of food. Changes in the brining of meat. Changes in the fat content of meat (did you know that 80/20 mince has about SEVENTY PERCENT more calories than 93/7 mince?) Retrogradation of starch. The conversion of fiber into pyruvate. It’s an accuracy apocalypse before you start the formal adding-up process.
  • Calorie DENSITY is also a disaster. For instance, take the chili recipes. In the 30s, you cook the dish covered with a reasonably small amount of fluid, in 2006, you add watery ingredients AND whole liter of fluids but then you cook open for an hour. The chicken a la king is the same — the ’06 version adds a lot of fluid and then cooks it down. A lot. But how much?What is the calorie density of the remaining food? No idea.

At the end of the day, even with every last detail ready and waiting for us in the cookbooks, we still need to make an alarming amount of assumptions to populate things like the calorie content and density.

I didn’t anticipate this before I did the analysis above. I thought it would be quite easy, all the data pre-collected and pre-collated, and perfectly preserved in an absolute public record. That assumption was not at all borne out. It’s much more tenuous than that, and I think any competent researcher having to handle a similar problem would come to the same conclusion.

The only way around this is to:

(a) have a pre-registered, no-funny-business data gathering and analysis plan, and

(b) analyse more than 18 recipes!

This is the most puzzling aspect of all. With 4500 recipes in the book, and a long-term focus within multiple editions of the book on basic family-centric recipes, this sample size is ridiculously small. Just concentrating on the 1936 vs. the 2006 book would have been a far more appropriate way to address the hypothesis of interest.

Conclusions

This isn’t a bad idea for a paper. And to me it seems the paper isn’t even so bogus that it should be retracted.

But that’s not a compliment.

Don’t confuse “this isn’t so bad that it should be road-sawed out the scientific record altogether” with being good. It’s very far from good. It’s just not demonstrably, mathematically fallible.

I wonder how indicative it would be of the burgeoning obesity crisis if recipes from today had more calories. The first edition of the book was sandwiched between the Great Depression and the Recession of ’37. I don’t think they had Costco back then. A great deal has changed between people now and people then … for instance, modernity has literally allowed people to grow taller. Presumably they require more calories, and presumably that is not a health problem.

Lessons learned? If you can’t add, and attention to detail just isn’t your thing, don’t mess with a cookbook.

--

--

James Heathers
James Heathers

Written by James Heathers

I write about science. We can probably be friends.

No responses yet