This post was originally posted on my attempt at a science communication blog, Probably Interesting. I moved it over here in March 2026 so I could have everything I’ve blogged about in one place. It also explains why this wasn’t posted on a Sunday.
I should have watched University Challenge before last week’s post. If I had, I could have noted that Paul Erdős, one of the mathematicians for whom the Erdős–Rényi model is named, made an appearance in the quarter-final match between Imperial College London and the University of Warwick, in which Imperial’s team faced three bonus questions about him. In the final question they had to complete the Erdős quote: “A mathematician is a machine for turning ____ into theorems.” The answer, as correctly determined by the team, is “coffee”, and, as Paxman said, he was “quoting another Hungarian mathematician”… that mathematician being Alfréd Rényi, the other person for whom the model is named.
The other thing I forgot to do last week, in bad academic practice, was offer a source for any of what I was saying. That’s partly because this is material that’s sufficiently old that it’s entered the realm of “stuff that appears in textbooks” rather than “stuff that appears in papers”, and I was taught it in a lecture course during undergrad (given by Oliver Riordan in the year I took it). But if you want this referenced to a published source, a good place to look is Chapter 4 of Random Graphs and Complex Networks, a textbook by Remco van der Hofstad, which is available in PDF form on the author’s website.
So I said I’d explain in part II of this series why we get a phase transition. We’re going to take a short digression in order to explain this, and talk about “exploring” the graph, using something called a “breadth-first search”. The way this works is as follows: every vertex can be in one of three states, namely “sleeping”, “awake”, or “dead”, and we start with the vertex labelled “1” awake, and the rest sleeping. We will also have a “current vertex” at each step, which for the first step will be the one that’s awake.
On each step, we explore the graph by looking at sleeping vertices that are adjacent to the current vertex—that is, joined to the current vertex directly by an edge. It’s possible there are none, but if there are, we wake these up; then, whether we found any adjacent vertices or not, we kill the current vertex. We then need to choose a new current vertex, for which there are three options:
- If any vertices are awake, we choose as the current vertex the one of these which was longest-ago discovered, taking the lowest-labelled if there’s a tie.
- If all vertices are dead or sleeping, but they’re not all dead, we awaken the lowest-labelled sleeping vertex, and make that the new current one.
- If all vertices are dead, we stop.
A video might help here, so I made one. You can watch it on YouTube, because GDPR makes it quite difficult to embed YouTube videos on blogs, technically.
There are a few things to notice here:
- Vertices transition from sleeping, to being awake, to being dead, in that order. Once dead, they stay dead (as you might expect!).
- We always explore edges next to a vertex that’s awake; since we kill each vertex once explored, and then it stays dead, that means we only explore each vertex once. So the number of steps is the number of vertices in the graph.
- Going from the start of the process, we feel our way through the graph from vertex 1, waking up as many vertices as we can, and then killing the one we’ve explored. That means that we will eventually wake up every vertex in that component.
- Crucially, we wake up every vertex in that component before we run out of vertices that are awake to select as the current one. This means that the first point at which we run out of awake vertices, and we have to select option 2 instead of option 1 for our new current vertex, is the point at which we’ve explored the whole component. Since we kill one vertex per step, the number of steps it takes to get there is the size of that component.
- We then proceed to explore another component, and repeat until the whole graph is explored.
So this process encodes some information about the component sizes of the graph. We can apply this process to a deterministic (i.e. non-random) graph if we like, but we’re going to apply it to random graphs: specifically, for now, Erdős–Rényi ones. Then, because the graph is random, the search is random too. This is useful, and will crop up again later, when we come to consider the critical regime of the graph, at the point of the phase transition. Just before we use this to explain the phase transition itself, let’s take a minute to see why it’s called the “breadth-first” search.
It comes from the point in option 1 where we said that we choose the awake vertex discovered longest ago as our current one: this means that we explore all the neighbours of the initial vertex—a broad sweep, potentially—before moving on to any vertices further away. (An alternate option is to make the awake vertex that was discovered most-recently our new current one: this causes us to explore as far away from the start-point as we can before backtracking, giving a “depth-first search”.)
Okay, so it was useful to say that we started exploring from vertex 1 here because it gave us a nice complete definition. But we could start an exploration from any vertex of the graph. In the Erdős–Rényi model, the vertices are all, in a sense, equivalent (formally, they’re “exchangeable”)—there’s nothing special about any of them, since we did an independent coin flip for every single possible edge, with the same probability. This is going to be the useful idea for working out why the phase transition happens.
Now let’s ask the probabilistic question: what’s the distribution of the number of edges we find at each step? On the first step, assuming we’re dealing with , it’s a binomial distribution, with parameters and (denoted , because that’s essentially defined as “the distribution of the number of coin flips that come up heads, when you flip a coin times and the probability is ”. On the second step, because we only discover edges between the current vertex and sleeping vertices, it depends on how many vertices we woke up in the first step, and we have to discount the starting vertex too. But if is small, this will probably have a negligible effect, so the distribution is again approximately again; and, since all the edges are independent, this is (up to the same approximation) independent of the first step. And so it goes until a relatively large proportion of the vertices have been explored.
So, at least in the initial stages of an exploration—remember, starting from an arbitrary vertex—we can imagine a sort of “family tree” of vertices, where, if the family tree dies out, we’ve reached a point where we have to take option 2 above, and our component is complete. In our case, each vertex has (approximately) an independent and identical number of descendants (vertices we discover by exploring it). If we had this structure exactly rather than approximately, it would be a Galton–Watson process—and, conveniently, I’ve already written about those.
The crucial thing was that their evolution depended only on the mean number of descendants of each individual. Less than 1, and the tree dies out, often pretty quickly. And look: in the subcritical case (the one with the small components), the average number of descendants of each vertex was approximately
where, because this is the subcritical regime, was less than one. So this suggests that, starting the exploration from any vertex, it will quickly die out—i.e., that there are only small components.
In the case where the mean number of descendants was bigger than 1—which corresponds to our supercritical case, for the same reason—the Galton–Watson process can usually still die out, but it has probability less than one of doing so. It has a positive probability, however, of growing forever. When we consider starting our exploration of the random graph from each vertex, the case of dying out corresponds to that vertex’s being in a small component. The case of growing forever doesn’t literally mean that the vertex is in an infinitely large component, because that’s impossible in a finite graph; when we’ve discovered enough vertices, the approximation eventually breaks down. But it only starts to break down once we’ve explored “a lot” of the graph,¹ so this case corresponds to that vertex being in the giant. Indeed, it turns out that the proportion of the vertices that belong to the giant is approximately equal to the probability that that Galton–Watson process lives forever, which, when we consider starting explorations from different vertices, makes some intuitive sense.
This was all a “hand-wavy” argument, and was therefore non-rigorous. Mathematicians really want proofs, though these sorts of discussions are useful for getting an understanding of what’s going on, even if you have to prove it some other way. It turns out, though, that in this case you can turn the intuitive argument into a rigorous proof. I’m not going to do that here, but van der Hofstad’s book has all the details.
And we haven’t even begun to touch on what happens when , the critical regime. That will have to wait for next time, along with an example of the sort of thing that mathematicians consider beautiful: fractals.
¹ I’m trying to avoid formal mathematical statements about limits here, but I acknowledge that a statement like this could annoy mathematicians. For you: formally, since we’re considering here, the approximation breaks down in the limit only after steps have elapsed.


Leave a Reply to Explaining my thesis, part III: the motion of Brown – Escaping Oxford Cancel reply