This post was originally posted on my attempt at a science communication blog, Probably Interesting. I moved it over here in March 2026 so I could have everything I’ve blogged about in one place. It also explains why this wasn’t posted on a Sunday. Confusingly, the next paragraph is also in italics, but that paragraph was in the original. It was just in italics there too.
This is the second of a pair of posts answering the question: what is the probability of drawing the same set of tiles twice in Scrabble, making the same permitted word? This is probably not a question you’d ever thought to ask, but I did when it happened to me. If you haven’t already, you might want to read the first part.
Last time, we established the probability of drawing a particular word, STAINED, twice in succession at the start of the game. But, of course, we don’t care which particular word it was. If the probability for each word were the same, we could just multiply by the number of words. If only it were that simple.
The obvious way in which the probability of the words can vary is that different tiles appear different numbers of times in the bag. For instance, any word containing a J, K, Q, X or Z has a probability of being drawn twice equal to 0, simply because there is only one of each of these tiles in the bag.
But there’s a more subtle way, too, which we can see if we look at another word: let’s try SCARIER. Again, let’s pick one order (we’ll work with the order of the letters in the word this time) and work out the probability of drawing the letters in that order. There are four Ses, two Cs, nine As, six Rs, nine Is and twelve Es; then we get to the last tile, where we need another R. If the first six draws went our way, we’ll have already drawn an R on the fourth, so there are only five left in the bag. Thus the probability of drawing the word in order is given by
The next thing we did last time was multiply by the number of orders into which we could put the tiles. Before, that was 7!, or 7×6×5×4×3×2×1, or 5040. But wait! We have two Rs now, and, given any ordering, swapping those over makes no difference. So there are actually half as many orderings this time: 2520. So our probability of getting the word SCARIER, in any order, on our first draw, is that big long multiplication above, times 2520—which works out at about 0.000 0073. We will need to make the appropriate modifications when calculating the probability of drawing the word a second time, given that we drew it the first; to use the phrase loathed by undergraduate maths students, I leave this as an exercise for the reader; similarly, I leave it to you to work out how we might generalise further to words with a letter occurring three or more times, or with two letters appearing twice.
Anyway, so we need to go over every seven-letter word, calculate the probability of drawing it twice, and add those up. There are 34,342 seven-letter words in the Collins Scrabble Words list, so this could take a long time. Fortunately, there’s an app for that. Or rather, we can write one.
I wrote some computer code, in a language called “R”, that can do this task. (If you haven’t heard of it, R is a language particularly designed for processing data; a major advantage of it over some of the alternatives is that it’s free and open-source, so no licensing fees and a large community of volunteer developers.) I’ve shared the code on a separate page, for anyone who might be interested, but the code itself isn’t important. After all, programming languages are just that—languages, with vocab and grammar that you have to learn before you can speak them. What is important is what you want to say: namely, the tasks you want the computer to do. (It isn’t even all that important exactly what instructions you give: code can be written in lots of different ways that accomplish the same thing, just as two people told to “Go to Charing Cross station via the base of Nelson’s Column” and to “Go to the Trafalgar Square lions, then to the nearest mainline station” will tread essentially the same path.)
So, we first want to load a list of allowable Scrabble words. Here I hit a slight problem, because the Scrabble word lists are copyrighted (the one used in English-language games outside the US and Canada being owned by Collins), and I don’t have the budget to pay for a licence to use them. So I loaded a substitute list, so I could at least get an estimate on the probability, and build the method. Given the actual list, I would only have to swap one line of code to perform the calculations on it—the one which tells the computer which list to use.
Whatever list we use, it will have way too many words to start with, because we’re only interested in seven-letter ones: we’ll get rid of the rest. But we’re still overcounting, because we’re interested not in words per se, but in sets of letters that can be rearranged to make a word. To go back to our old example, the tiles I drew that made STAINED can also be arranged to make six other allowable words (ten points to anyone who finds one), and we’re not seven times as likely to draw those tiles simply because we can make seven different words out of them.
So we want to get rid of duplicates. The easiest way I could think of to do this was to rearrange ever word into alphabetical order, and then delete any duplicates that resulted. This may not be the quickest way, in terms of processing time, but it was certainly quick enough: it took no more than a couple of seconds to do everything I’ve just listed.
What took the time was the next bit: running through every word on the list and calculating its probability to be drawn twice. For this, I’d stored all the letter frequencies in a table, so my code extracted the letters from the word, worked out how many there were of each in the word, looked up from the table how many there were of each in the bag, and put all the numbers into the appropriate formula. Doing this for every word took my laptop about twenty minutes, running in the background.
Then all the computer needed to do was add all the probabilities up and spit out the sum: another quick task, even with tens of thousands of entries. This gave our answer: the probability of drawing the same seven-letter word twice on one’s first two turns is 0.000 000 38, or about one in 2.6 million. So about seventeen times more likely than winning the lottery, but about four times less likely than dealing out five cards from a standard 52-card deck and getting a royal flush. (Something that’s also happened to me, as it happens.)
To test the code, I also calculated how likely it was that you draw one seven-letter word, on your first turn; doing that only required a small modification to the formula, and otherwise everything was the same (indeed, it was possible to run both at once). The output was 0.11, not too far off the answer of 0.13 given by someone on Reddit (the difference, presumably, being explained by the different word list).
And what about my Scrabble game? It turned out that my opponent’s first tiles were EINOPRU, which don’t make a seven-letter word on their own… but which can be used to make the word INPOURED when combined with the D from STAINED. What’s more, I couldn’t find a place to play my second STAINED, so my incredible stroke of luck likely went unnoticed. And at time of writing I am losing 363 to 220. Well, c’est la vie, as one might say—but then I’d have to do the calculation for the French word list instead.


Leave a Reply