SPRITE Interlude: The Umbrella Graph — Connecting GRIM and SPRITE. Also, Brunch Sucks.

James Heathers
6 min readApr 15, 2017

Consider any aggregate mean from a scale containing discrete measurement points used in clinical or behavioural research.

Now THAT’S an opening sentence. Take that, Hemingway.

“That was SERIOUSLY inelegant, James… Rum?”

There’s an obnoxious amount of scales used in the social sciences and other associated fields. Perhaps thousands, perhaps tens of thousands.

They address a constellation of capacities, opinions, feelings, tasks or dispositions. From each scale, we have common internal and external features — generally a cute acronym, a series of questions, a factorial structure, subscales, a published validation, a scoring system and so on.

Let’s just choose a few at random:

DASS, the Depression Anxiety Stress Scales,[1] is made up of 42 self-report items to be completed over five to ten minutes, each reflecting a negative emotional symptom.[2] Each of these is rated on a four-point Likert scale of frequency or severity of the participants’ experiences over the last week with the intention of emphasising states over traits.

Work Readiness Scale (WRS)… A 64-item, four factor solution was obtained and psychometric analyses indicated high reliability for all factors. The final four factors were labelled personal characteristics, organisational acumen, work competence, and social intelligence.

YRBSS: (Youth Risk Behavior Surveillance Survey) — This questionnaire addresses six type of health risk behaviors, one of which is risky sexual behaviors that contribute to sexually transmitted infections including HIV and unintended pregnancy. … This is a 89 item questionnaire and directly addresses health related behaviors. 9 items of YRBSS are about risky sexual behavior.

Scales. Clear? Clear.

All scales are constrained. They have upper and lower bounds.

All scales are granular. They invite measurement in whole numbers.

In other words, there will be a series of restrictions on what values the overall scale can and cannot take. If we plot every possible mean and standard deviation pair (x and y respectively), we meet the umbrella graph.

Pretty.

It’s a handy visual representation which can explain a few things about how we do anomaly detection. Some of you may have gotten an email from me with one of these.

Remember: x-axis =MEAN, y-axis =SD.

Imagine a single-item scale from 1 to 9, which we give to n people.

What’s the maximum mean? 9, if everyone puts 9. “Do Americans like brunch?” 9’s all round. Apparently, this entire country, every man jack of them, can’t wait to pay $13 for eggs.

Minimum? 1, if everyone puts one. “Do Americans like it when you point out brunch is just a) breakfast if you’re lazy, b) a pit of hopeless drunks, c) an effete scam to get you to pay far too much for prosecco and orange juice?” A uniform row of angry 1’s. Apparently, crimes against the mimosa = some kind of shifty foreign treason on my part.

Those are obvious. And brunch sucks.

So, what’s the maximum SD? When the numbers are maximally far apart from the mean, obviously — so, an equal number of 1s and 9s in this case. Thus, that SD will be very slightly higher than 4, dependent on n. Note that the population SD will be exactly 4.

And the minimum SD? Why, 0 of course. There are lots of places that’s possible — any time we have answers which are identical, and there’s no variance at all, i.e. all 1s, all 2s, all 3s, and so on.

Now, let’s look at an actual umbrella.

Imagine that but exactly straight-on. The ribs (metal spines which make up the supporting structure of the umbrella) describe the green graph points over the top of the umbrella, the scalloped canopy edges (fabric bits) describe the red points along the bottom, which end in tips (I don’t need to tell you what tips are).

The blue points between them are the canopy panels, the rain keepy-offy parts. I have included only a few of these possible blue points — an actual image would have nC9 answer sets to choose blue points from, which is totally unwieldy to graph and would look blobby and awful.

And here is the money:

GRIM inconsistencies are points inside the graph which have no possible x-value.

GRIMMER inconsistencies are points inside the graph with no possible y-value.

SPRITE anomalies are one of two kinds:

  1. absolute (i.e. they are points outside the red or green bounds of the graph, where no solution can exist)
  2. relative (i.e. they are points inside but close to the bounds of the graph, which therefore must come from really manky looking distributions).

There are some differences, of course.

  • GRIM and GRIMMER have sample size restrictions, SPRITE doesn’t.
  • GRIM and GRIMMER are absolute — they record absolute inconsistencies. SPRITE generally points out relative inconsistencies, in that it identifies weird distributions which give unusual mean/SD values.

It’s the relative SPRITE anomalies that have the potential to be a little difficult to understand, because unlike the others, these need to be interpreted. Let’s look at a few unusual results on an umbrella graph, and interpret them.

That yellow diamond is a published set of values (yes, a real one). It’s quite close to the green border, meaning it is approaching the absolute limit for the amount of variability it can contain. If we hit it with SPRITE and make some sample distributions:

… then they look insane.

Remember the ‘horns of no confidence’? That’s a wild amount of variability. Something has gone awry, unless it’s a collective answer to very polarising question.

Alternatively:

Again, yellow diamond is a published set of values (yes, a real one). It’s awfully close to the bottom, which means it has almost zero variability. SPRITE happily returns the only possible two distributions:

… and they’re extremely well confined.

It might be a reasonable distribution if it was the answer to a question like “is (some vague and mostly pleasant thing) nice?”, to which everyone would say “yeah, I guess so, mostly”.

But it’s a huge problem if it’s the answer set to a question with an obvious or extreme answer, like “is it bad to mug old ladies in the street?” or “should you set your neighbour’s cat on fire?” or “how much would you like a big bag of free money?”

Clear? Clear.

It’s all very simple, at the end of the day. But simple is no restriction to effective.

And because I am amazing, I have managed to find an image combining GRIM and SPRITE

--

--