This post was originally posted on my attempt at a science communication blog, Probably Interesting. I moved it over here in March 2026 so I could have everything I’ve blogged about in one place. It also explains why this wasn’t posted on a Sunday.
CN: coronavirus
Work wasn’t going well today. So I found myself drawn to the web, and this particular article from the Guardian caught my eye: “K number: what is the coronavirus metric that could be crucial as lockdown eases?”¹ It’s linked from their homepage. I’d heard of the R-number, but not the k-number—and nor had one of my Stats friends—so, given that my field is statistics (ish), I thought I’d investigate.
“k-number”, it turns out, is pretty hard to Google, because lots of fields use the letter k to stand for a number. (Normally k is used for a whole number—an “integer”—but here it can take any value >0, which added to the confusion.) I did track down other news sources describing the k-number, including the Times,² the Daily Mail,³ and a general-interest piece from the prestigious journal Science,⁴ which illustrates it, for reasons best known to their subeditors, with a delightful picture of some animal carcasses in a factory. But all of them gave the same incomplete explanation.
These sources tell us that “k” is a measure of variation in how a disease spreads, in contrast to how R measures the average number of people each person infects. If k is low, it means that some people are spreading the disease a lot (“super spreaders”), and others not very much at all. If k is high, transmission is pretty uniform.This is nothing particularly new: the mean and variance of data are the workhorses of statistics, giving metrics of the average and the spread of the data. But from the examples given “k” didn’t seem to be the variance, or calculated from it. It was puzzling, especially as the sources also implied that this was a common technique: Science even described it as something “scientists use” in generality, without specifying which ones.
Science did describe “k” as something called the “dispersion factor”, which I had heard of. Well, sort of: I’d heard of something called the “dispersion index” or the “Fano factor” or various other names, if never quite the exact combination in the article. But it wasn’t clear if that was what was meant.The one thing the articles had in common was the name of a scientist called Adam Kucharski, who seemed to be quoted in all of them. That suggested that it was something to do with his research and, lo and behold, the Science article linked to one of his recent pre-prints with his co-authors.⁵ They didn’t directly explain what it was, but I did manage to deduce that it wasn’t the thing I’d heard of. Instead, they cited it to another paper,⁶ which *did* explain it, if only in the supplementary material. I’m going to try and explain where it comes from.
The key thing here is that Kucharski et al are modelling the number of people each person spreads the disease to using something called a “negative binomial” distribution. I’m not going to go into detail as to what it is, but one of its parameters is the k-value; the other is conventionally a probability, but you can take it to be the mean if you like: remember, that’s the R-value that we all know and love.
The Guardian article—actually, Kucharski in the Guardian—said that “Once K is below one, you have got the potential for super-spreading”. This suggests that, as for R, 1 is a threshold number for k, either side of which behaviour changes significantly. (I’ve actually written something about why 1 is a threshold for R, which will be in a future post.)
If I’m honest, though, I don’t see it. The value of 1 corresponds to a person spreading the disease according to the “geometric distribution”, which goes something like this: the disease tries to spread to one person from the first. If it succeeds, it tries again, with the same probability of success, and keeps trying. But once it fails, it stops transmitting at all.
There doesn’t seem anything all that significant about that case to me—there certainly isn’t a dramatic change in behaviour either side of it, like there is with R. It also doesn’t seem particularly plausible as a way anything could spread, but that isn’t my area, so I might be wrong here.Not that that really matters: negative binomial modelling is a perfectly valid technique to use as far as I know, and so I’d say the issue there is in the reporting. (I certainly take issue with reporting statistical models as if they’re some sort of black magic, without even attempting to give the reader pointers as to how they might make sense.) We can also say something about changes in estimates of k: if it increases, for instance, super-spreader events are becoming less important to the dynamics of the outbreak.
But the reason I don’t think the k-value is going to become some huge thing we should all start following like investors follow the stock market is that it’s something peculiar to this particular way of modelling the spread. If you don’t use the negative binomial, the k-value is not a sensible thing to think about—it’s just one way of adjusting probabilities to approximately fit your data. And the key is “approximate”: the negative binomial distribution could be a good model for spread, but it’s not going to be anywhere near exact.
By contrast, the R is something natural: it’s just an average. It’s tricky to calculate because there is no direct data on who has passed the virus on to whom, especially as the contact-tracing system is still in its early days. But at heart it reflects something fundamental about the virus.So while the k might be good for modelling the number of cases, and its changes over time might tell us something about the spread, it’s still more important to be able to tell your Rs from your elbow.
¹ Davis, Nicola (1st June 2020). “K number: what is the coronavirus metric that could be crucial as lockdown eases?”. The Guardian. Retrieved 1st June 2020.
² Conradi, Peter (31st May 2020). “ ‘K’ number pins down how the coronavirus spreads”. The Times. Retrieved 1st June 2020.
³ Chalmers, Vanessa (31st May 2020). “After R, now K is key: New rate measuring variation in spread of infection becomes crucial to fighting second wave of covid by tracking superspreader events which could reignite illness”. Mail Online. Retrieved 1st June 2020.
⁴ Kupferschmidt, Kai (19th May 2020). “Why do some COVID-19 patients infect many others, whereas most don’t spread the virus at all?”. Science. Retrieved 1st June 2020.
⁵ Endo, Akira; Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group; Abbott, Sam; Kucharski, Adam J.; Funk, Sebastian (9th April 2020). “Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China” (preprint). Wellcome Open Research. 5:67. (doi:10.12688/wellcomeopenres.15842.1)
⁶ Blumberg, Seth; Funk, Sebastian; Pulliam, Juliet R. C. (October 2014). “Detecting Differential Transmissibilities That Affect the Size of Self-Limited Outbreaks”. PLos Pathogens. 10(10):e1004452. (doi:10.1371/journal.ppat.1004452)


Leave a Reply