I’ve been watching Strictly Come Dancing since 2011, when Harry Judd, the drummer from McFly, beat Jason Donovan and Waterloo Road’s Chelsee Healey to the Glitterball trophy. For those unfamilar, Strictly is a BBC television show that started in 2004, as an example of the “get a famous person to do something they’re not famous for” genre of programming that was big back then—I think it was one of the first, and it was certainly one of the few to be still going.1 If you’re from beyond these shores, you may know it by the name of the international franchise, Dancing With the Stars.
A year or so ago I had a question I wanted to answer about Strictly. I can’t remember now what it was—probably something along the lines of “What is the highest score for each of the dance styles?”. Anyway, this is, it turns out, quite hard to answer. The information is all there: Wikipedia has incredibly detailed reporting of every episode of the show throughout its entire history, for one thing. There’s also a website called StrictlyDB, which is a website based around a database of all the dances. But I couldn’t see any way to access the underlying dataset for StrictlyDB to answer a question not already answered by their (admittedly, pretty detailed) “Statistics” section. As for Wikipedia, that’s stored as something called “wikitext”, which is a structured way of writing page content; it underpins every page on the site, and the software for Wikipedia then renders it as HTML so you can read it in your web browser. Either way, though, the information is essentially just stored as text, so to be able to do any stats on it you’d need to scrape the contents for all the articles for all the different series, and then process it into (essentially) a massive table that can be analysed. And who could be bothered with that?
Ahem.
The code I used to do this is online. Please don’t judge what I’ve written too harshly: I am largely self-taught in R, the language I used, so there is probably some pretty ropey stuff in there. Less excusably, I have put code comments, to explain what any of it does, in almost none of it, so I guess you can judge that. In my defence, this code will only ever have one user, who—in theory—knows what it all does.2 It only has one user because of what it generates: an R package, which actually is fully documented, and which is available for anyone who’s interested so they can use it to do their own analysis.
Fortunately, “anyone who’s interested” includes me.
Judge scores
One of the things you can do with this is look at how different judges have scored. You could just do a straight average, which gives you the following ranking:
| Rank | Judge | Average score |
|---|---|---|
| 1 | Anton Du Beke | 8.06 |
| 2 | Motsi Mabuse | 7.88 |
| 3 | Shirley Ballas | 7.72 |
| 4 | Len Goodman | 7.67 |
| 5 | Alesha Dixon | 7.65 |
| 6 | Bruno Tonioli | 7.63 |
| 7 | Darcey Bussell | 7.59 |
| 8 | Arlene Phillips | 7.05 |
| 9 | Craig Revel Horwood | 6.74 |
You might be a bit surprised by this ranking. Shirley Ballas certainly doesn’t seem like she’s particularly generous, whereas I remember tabloid fuss in Alesha Dixon’s series that she was too generous. And was Bruno Tonioli really in the meaner half of the table?
The reason for these surprising results3 is that there’s been a definite trend of grade inflation across the 23 series of the show, meaning that judges who were only in earlier series might seem to be “meaner” than they really are.

So let’s now split that out by judge and series. We’re only going to show a line for each judge that covers that series where they were a main member of the judging panel. A couple of the judges4 were guest judges before they were added to the main panel; their scores from those weeks are not included.

The thing I find interesting about this graph is that, while Arlene Phillips was on the show, she seems to have been something of a “midpoint” judge, consistently scoring about half a point below Len Goodman and Bruno Tonoli in the series-wide averages, and about half a point above Craig. What changed in series 7 wasn’t that Alesha Dixon was particularly generous, but that she was more in line with the two judges to her left5 on the panel, and that midpoint was gone. And that’s not unique to her, either—from that point onwards, it’s hard to draw distinctions between the three non-Craig judges, whoever is occupying the chairs.
But the average doesn’t tell the full story. I mean, nothing tells the full story unless you read down all 23 Wikipedia articles that I scraped this data from. But still, the average tells very little of the story, when you think about it. What about outliers? After all, one thing that makes a judge feel particularly “mean” (or “generous”) is when they are out of line with the rest of the panel, regardless of the numerical value of the score. How often have the judges risked undermarking by giving a score that is below all the other judges? And how often have they gone the other way?

This ranking is maybe less surprising, although I hadn’t realised that Craig has given a lower score than all other judges on the panel more than half the time. That said, while it is hard to see, there are some occasions on which Craig was more generous than the other judges. To be precise, there are nine:
- Aled and Lilia’s foxtrot, in week 4 of series 2;
- Claire and Brendan’s rumba, in week 2 of series 4;
- Gabby and James’s quickstep, in week 2 of series 5;
- Lisa and Robin’s Viennese waltz, in week 2 of series 10;
- Mark and Karen’s rumba, in week 12 of series 12;
- Anastacia and Brendan’s cha-cha-cha, in week 1 of series 14;
- Kaye and Kai’s tango, in week 1 of series 20;
- Angela R. and Kai’s Charleston, in week 6 of series 21; and
- Amber and Nikita’s samba, in week 2 of series 23.
Dances
Now that we’ve started considering different kinds of dance, which tends to get the highest score? You might think it would be the samba, or the rumba, right? They have a reputation for being challenging, after all. Well, you’d be surprised. This graph shows, for each series, the percentage difference between the average score for that dance, and the average score for all dances.

Based on this, the rumba and the samba aren’t easy, but they’re not hard either, scoring fairly close to the average. The hardest dance, in fact, appears to be the cha-cha-cha, while the easiest seems to be the, um, showdance, the dance that is the culmination of a Strictly star’s “journey”. Hang on a second…
We might have controlled by series this way, but we’ve not controlled by what point we’re at within the series. The showdance only occurs in the final, when only the best dancers are left, and have been (hopefully) getting better as the weeks go on. Meanwhile, it turns out that the cha-cha-cha tends to skew earlier in the series. So we’re going to have to control for that as well.
What we’ll do instead is look at the percentage difference between each individual dance and that week’s average, and take the average of that. This is far from perfect: for a start, some earlier series had episodes where there were only two possible dance types, and everyone did either one or the other. But anything harder would require effort, and I reserve that for researching defunct board game companies.

Well, colour me shocked that the couple’s choice gets the highest relative scores.6 The samba looks to be the hardest, but it’s honestly hard to tell, and it tended to be scored up about a decade or so ago. Beyond that, this graph isn’t very illuminating, so let’s move on.
Back to the judges
I think we were on safer ground when we were comparing the judges. Their scores, as you may know, don’t matter in the final. The main reason is that the public vote is (exceptionally) the sole determinant of the winner for the final. But another reason that sometimes comes up is that a judge gives top marks to everything, so might as well be replaced by a “10” paddle on a lever. That seems to have happened a lot recently, but is it truly a modern phenomenon? Well…

Before series 15, this never happened. From that series on, it’s happened with at least one judge more often than it hasn’t, and in three of those series Craig was the only judge to bother shaking the dust off any of the other paddles. (Credit also to Darcey Bussell, the only other judge present at any point after this started not to do this.) Incidentally, it’s purely a creature of high-scoring finals: no judge has given all the same score in anything other than the last week of a series, and no number has ever been a judge’s only choice other than 10.
Music
To conclude, let’s have a look at the music used to accompany the dances—what’s the most overplayed Strictly song? This analysis should be treated with caution: I have only very lightly cleaned the music data scraped from Wikipedia, so sometimes the same song may be recorded slightly differently on Wikipedia.7 With that caveat, here are the top three most-used songs:
| Rank | Title | Original artist | No. uses |
|---|---|---|---|
| 1 | España cañí | Pascual Marquina Narro | 7 |
| 2= | I’m Still Standing | Elton John | 6 |
| 2= | When Doves Cry | Prince | 6 |
There are then 18 songs that have been used exactly five times, so I thought I’d stop there.
If, like me, your first thought is “‘España cañí’ can’t be the most-played song, I’ve never heard of it!”—you almost certainly have, in fact, heard it. It’s also known as “the Spanish Gypsy Dance”, which is a very rough translation of the Spanish name, and indeed it appears in the dataset an eighth time under that name. It is the almost stereotypical paso doble music: Wikipedia tells me that “among Latin dancers, it is known as ‘the pasodoble song’”, which is unsourced but sounds plausible. Indeed, seven of the eight times it has been used on Strictly it has accompanied a couple dancing the paso, with the seventh time being the ill-advised “Paso Doble-thon” in series 15.
“When Doves Cry” clearly has tango vibes, having been used four times for a tango and twice for an, er, Argentine tango. Much more varied is “I’m Still Standing”—it skews jive (three uses), but has also been used once each for a salsa, a quickstep, and a showdance.8
As a final table, let’s see the most-credited original artists. Let’s have a top ten this time—or rather, because of ties, a top 11. The same caveats apply, but, well…
| Rank | Original artist | No. credits |
|---|---|---|
| 1 | Frank Sinatra | 51 |
| 2 | Michael Bublé | 31 |
| 3 | Whitney Houston | 29 |
| 4 | Gloria Estefan | 21 |
| 5= | Aretha Franklin | 19 |
| 5= | Queen | 19 |
| 5= | Stevie Wonder | 19 |
| 8= | Barry Manilow | 17 |
| 8= | Elvis Presley | 17 |
| 8= | Michael Jackson | 17 |
| 8= | Tom Jones | 17 |
… data issues or not, I don’t see anything knocking Ol’ Blue Eyes off the top of this chart. And, of course, if he can make it there, he’ll make it anywhere.
If you want to see the code I used to make this blogpost, that’s also on GitHub. Also, because Wikipedia is licensed under the Creative Commons Attribution-ShareAlike licence 4.0 International licence, so is the “strictly” package I made, and consequently so too is this post.
- British television in the next few years brought us Dancing on Ice (ice dance), Soapstar Superstar (singing), Soapstar Superchef (cooking), Celebrity Scissorhands (hairdressing), Celebrity Wrestling (er, wrestling, but apparently not really), and Spelling Bee, which was going to be called “Celebrity Spelling Bee” until someone apparently thought that might put people off. Oh, yeah, and a short-lived programme hardly anyone watched called Celebrity Love Island. ↩︎
- The “in theory” explains why this is not a good defence. ↩︎
- Well, I was surprised. ↩︎
- Darcey Bussell and Anton Du Beke, to be precise. ↩︎
- The viewer’s right. ↩︎
- For those unfamiliar, the “couple’s choice” is so named because it originally allowed the couple to choose between one of three modern dance styles (contemporary, street/commercial, and theatre/jazz), and which was preceded by a heartstring-tugging VT explaining how the dance was reflective of the couple’s backstory. The distinction between the three dance types has gone, but it’s usually still identifiable as one of them, and of course the heartstring-tugging VT has gone nowhere. It usually gets a high score, which may be because the judges are less familiar with the dance style and so tend to be generous, and may be because it comes across as heartless to mark down a performance in tribute to someone’s dead gran. ↩︎
- I did consider doing more-thorough data cleansing on this. But that was when I thought I was sitting on a train for three-and-a-half hours today, when in fact two of those hours were spent standing or squashed into a corner because of some lovely people who refused to move from our reserved seats. Your loss is their gain, I guess. ↩︎
- And again for another quickstep, but on that occasion Wikipedia credits it as being “from Rocketman” because it was Movie Week. ↩︎


Leave a Reply