Double double word score: R code

It would be irresponsible of me to put code on this site and say, “Here, run this!” without some sort of disclaimer, so… If you copy and paste code from the internet and run it, you should do so with the same care that you would take when downloading software more conventionally. While I have run this code myself, it is (in the legal jargon) provided with no warranties whatsoever; if you run it, you do so at your own risk, and I cannot accept any liability for any unintended consequences.

If you do nothing else, before you run the code, you should check that the web address included in it is still doing what it did on 10th July 2020, when I posted this: that is, providing a plain-text list of English words. The R script directly loads the content of that page, so if the page is doing something else, then at best the code will just throw up an error message, and at worst something completely unexpected could happen. In that case, before running the code you would need to change that line to load a word list from somewhere else.

install.packages("tidyverse", "FRACTION")
library(dplyr)
library(FRACTION)
#Adds some extra functions that aren't included with the basic R installation

alph <- c(LETTERS[seq(1,26)], "*") #Alphabet, plus * for a blank tile
freqs <- c(9,2,2,4,12,2,3,2,9,1,1,4,2,
          6,8,2,1,6,4,6,4,2,2,1,2,1,2) #Scrabble letter frequencies
freq.tbl <- tibble(letter=alph, bag.count=freqs) #Makes frequency tibble

letters.of <- function(x){ #Function to split strings into their letters
  strsplit(x, split="")[[1]]
}
alphabetise <- function(x){ #Function to rearrange the letters of a string
                            #into alphabetical order
  paste(sort(letters.of(x)), collapse="")
}
alphabetise <- Vectorize(alphabetise)
#Throughout, "Vectorize" allows these functions to take vectors of data
#as inputs

falling.factorial <- function(x,y){ #Function to calculate falling factorial
  prod(x:(x-y+1))
}
falling.factorial <- Vectorize(falling.factorial)

final.constant <- function(n, reps){ #Technical function involved in
                                     #probability calculation.
                                     # n is the number of letters in a
                                     # word; reps is the number of times
                                     # we want to draw it
  falling.factorial(100,reps*n)/(factorial(n)^reps)
}

probab.word <- function(word, reps){ #Function that computes the probability
                                     #of drawing the tiles making up a word
                                     #a number of times equal to "reps"
  letter.list <- word %>% toupper %>% letters.of
    #Converts to vector of characters in upper-case
  dat <- letter.list %>% table %>% as.data.frame %>% as_tibble %>%
    #Probably unnecessarily convoluted code to produce a frequency tibble
    #of the number of occurences of each letter in the word
    rename(letter=".") %>% inner_join(freq.tbl, by="letter") %>%
    #Adds letter frequencies from the bag
    mutate(num.cont = falling.factorial(bag.count, reps*Freq)) %>%
    mutate(contribution = num.cont/(factorial(Freq)^reps))
    #Technical functions providing the contribution to the product from
    #each letter
  c <- letter.list %>% length %>% final.constant(reps=reps)
  prod(dat$contribution)/c
  #Final calculations
}
probab.word <- Vectorize(probab.word)

num.letters <- 7 #We want seven-letter words

words_alpha <- read.csv(
  "https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt",
  sep="", header=F) %>% as_tibble() %>% rename(word=V1) %>%
  mutate(num.char = nchar(word))
  #Reads list of words, and counts the characters in each
words.of.length <- words_alpha %>% filter(num.char==num.letters) %>%
  mutate(alphab = alphabetise(word))
  #Takes only the words of the given number of letters, and
  #alphabetises them
alphabetised.words <- words.of.length %>% select(alphab) %>%
  distinct %>% mutate(prob.twice = probab.word(alphab,2)) %>%
  mutate(prob.once = probab.word(alphab, 1))
  #Computes the probability of obtaining each word twice, and then once,
  #using the previously-defined function (this takes a long time to run)

(p.twice <- sum(alphabetised.words$prob.twice))
1/p.twice
(p.once <- sum(alphabetised.words$prob.once))
1/p.once
#Outputs final probabilities by summing over words, as well as reciprocals
#(to present in the form "a 1 in N chance")

denom.once <- choose(100, num.letters)
num.once <- p.once*denom.once
hcf.once <- gcd(denom.once, num.once)
(num.once <- num.once/hcf.once)
(denom.once <- denom.once/hcf.once)
#For the "once" probability, returns the numerator and denominator of the
#fractional representation in simplest form
#(There's not much point doing this for "twice", because R can't store
#integers that large with sufficient precision that we can be confident
#we're getting the exact values.)

If you want to run this code, you’ll need to install R; details are on the R project website. I use a (free, but commercial) program called RStudio as well because it makes editing easier, but you don’t need it just to run the code.