The best strategies for Wordle, part 3 (July 2023)

From Things (various)
Jump to navigation Jump to search

By Alex Selby, 25 July 2023. Please note, this is not Wikipedia even though it looks a bit like it. It's just my blog that uses the same rendering engine (called Mediawiki).

The Best Strategies for Wordle, part 3 (July 2023)


This is an update to the March 2022 piece The best strategies for Wordle, part 2. To be honest, one of the purposes of this article is to show off this Wordle solver which finds the best strategy based on rigorous criteria (see below), and is as far as I know a lot faster than any other one that does the same thing. The idea is to give it a test-drive on the newly-expanded hidden wordlist: 3158 as of July 2023, up from 2309 before March 2023. I also hope it will prove interesting to see a complete analysis of each of the permitted 14855 guess words.

But first let's recap a few things. Wordle uses two word lists: a curated list from which a hidden (answer) word is chosen every day - let's call this the hidden list - and a much larger list of words that are permitted as guessing words when playing the game. The definition of optimal strategy used here is (as before) the strategy that minimises the average number of guesses required assuming that the hidden word is randomly chosen from the list of allowable possibilities, with each possible word equally likely. This means that to be an optimal strategy, each of your guesses has to take into account all possible subsequent guesses that might occur at a later stage of the game.

I should say that, though this is a natural definition of optimality that serves as a well-defined objective to compare different solving algorithms, and has been used by others in this way, it doesn't exactly correspond to "real-world Wordle". For one thing, in real Wordle an answer word is (apparently) never repeated, so, if you can remember what has gone before, you can rule out some words from the start. For another thing, the new words that were added in March 2023 are rarely chosen: between 27 March 2023 and 24 July 2023, a period of 120 days during which a 3158- or 3160-long list operated, the answer word was chosen from the old 2309-long list on all but four occasions (the occasions being BALSA, GUANO, KAZOO, SNAFU), whereas you'd expect around (1-2309/3158)*120 ~= 32 by chance, showing that the new words are being used more sparingly than the old words. So the analysis here is more of an academic exercise aimed at an idealised version of the game.

As mentioned, quite a few others have evaluated the best starting word and strategy in the terms above (used to be SALET, now TARSE), but as far as I know, very few (and none at all that I am aware of in hard mode) have proved that the best word is actually the best in this sense. To do so is computationally much more intensive task, involving showing that all the other 14854 (originally 12971) guess words are worse. This point was made here, where an earlier version of the program (operating on an earlier version of the wordlists) took a fraction of a second to come up with SALET as the likely best word, but 15 hours to demonstrate that it was according to the above notion of optimality.

There is plenty of lively discussion on the internet on the subject of the best first word, but most of it, even if it's being scientific, relies on the computationally easier task of maximising a favourite one-move-ahead heuristic.

Articles with different ideas about the best starting word

A rare example of a rigorous treatment using our preferred objective of minimising the average number of guesses can be found in this paper from September 2022 where a group at MIT proved SALET is optimal for the original version of the wordlists (2315 hidden words, 12972 guess words), taking "days to solve via an efficient C++ implementation of the algorithm, parallelized across a 64-core computer". By comparison, the original version of my program arrived at the same result in about 1 day on a single core home desktop computer and later versions take about 2 hours (single-core) for the same calculation. This comparison is made partly to show off, but it's also maybe somewhat informative because there are very few published proven answers with which a like-for-like comparison can be made and it illustrates the benefit of the techniques described in The best strategies for Wordle, part 2 (and others).

A much harder task is to go beyond finding the best word and calculate the value of each possible starting word. So instead of merely proving that the starting word QAJAQ, say (which happens to be the worst starting word), is worse than SALET, you have to determine exactly how bad it is, in other words how many guesses on average you would need to complete Wordle if you make QAJAQ your first guess (and so on for all of the other tricky words). Here there is a problem that some of these bad starting words can take a great deal longer than the better words to evaluate fully. My current program takes a day or so to evaluate all possible starting words using the previous (2309 word) hidden word list.

Still harder is when the list of possible hidden (answer) words gets bigger, which is what happened earlier in 2023 when the New York Times made a significant update to their list of possible answer words, increasing them from 2309 to 3160 in March 2023, and then slightly adjusting them to 3158 in July 2023. This increase of only 37% in the number of possible answer words makes the task of finding the optimal strategy much more than 37% harder — maybe 10 times harder or more — so it is now getting computationally onerous to fully evaluate all of the 14855 possible starting words. So this is the challenge I wanted to test my program with. I guesstimated it would take two weeks real time on 10 cores of my home machines (i.e., 3360 home CPU-hours) to evaluate each starting word in easy mode, and longer than that in hard mode. This seemed too long to wait so I bought some Amazon EC2 time to complete these tasks (despite what I said in the past about not using external computational resources). In the end, it took 3370 and 8261 CPU-hours to complete easy and hard modes respectively on EC2 C6g instances. These are not particularly fast processors (seemingly about half the speed of one of my desktop cores, which are themselves not particularly special) but they are relatively cheap at around $0.01 per CPU-hour on the EC2 spot market and I could run 288 of them at a time. This would have finished in around two days' real time at a cost of $116 had I arranged everything optimally, but I didn't quite manage to do this, some of my spot instances got forcibly terminated early, and some of the extreme outlier words took a long time to evaluate which somewhat unbalanced the 288 processes, so it took a day or so longer to complete the runs.

How does this analysis compare to that of the official New York Times Wordle engine, WordleBot 2.0? (If you find yourselves on the wrong side of the New York Times paywall then this is an alternative article on WordleBot.) We can see from the description that WordleBot assumes that hidden words occur according to their natural frequency in English (as defined by New York Times usage) rather that assuming that all hidden words are equally likely. This is a nice variation, avoiding a hard cutoff from "normal words" to "obscure words". In terms of how it scores a word, it appears from the description in the New York Times article that it is using 3Blue1Brown's notation of information (presumably adapted to a non-uniform distribution over hidden words) and scores your attempts in terms of bits of information. This is a perfectly valid thing to do of course and will lead to good practical results, but I would contend it's not quite as nice as minimising the expected number of guesses because it's fundamentally only looking one move ahead and applying a particular chosen function, so not taking into account what will be best from the point of view of the whole of the rest of the game.

Word Lists and Results

Allowable guess words
Possible answer words
Evaluation of each guess word in easy mode
Strategy file for best word (TARSE) in easy mode
Evaluation of each guess word in hard mode (An entry of 1000000000 means you can't guarantee solving it in 6 guesses)
Strategy file for best word (TARSE) in hard mode
Top 10 first guesses for Wordle in easy mode, word lists as of 1 July 2023
Rank First word Average guesses required Total guesses required
over all possible hidden words
1 TARSE 3.5526 3.5526 11219
2 SALET 3.5576 3.5576 11235
3 CARET 3.5630 3.5630 11252
4 SATER 3.5659 3.5659 11261
5 TORSE 3.5671 3.5671 11265
6 CARTE 3.5681 3.5681 11268
=7 CARLE 3.5687 3.5687 11270
=7 REAST 3.5687 3.5687 11270
9 TARED 3.5693 3.5693 11272
10 TRACE 3.5703 3.5703 11275
Top 10 first guesses for Wordle in hard mode, word lists as of 1 July 2023
Rank First word Average guesses required Total guesses required
over all possible hidden words
1 TARSE 3.6818 3.6818 11627
2 PLAST 3.6900 3.6900 11653
=3 LEAST 3.6903 3.6903 11654
=3 REAST 3.6903 3.6903 11654
5 SLART 3.6922 3.6922 11660
6 TRAPE 3.6935 3.6935 11664
7 TARNS 3.6944 3.6944 11667
8 CLART 3.6966 3.6966 11674
9 CLAST 3.6970 3.6970 11675
10 LEAPT 3.6979 3.6979 11678

Example commands

Command Description
wordle -a wordlist_orig_all -h wordlist_orig_hidden -c10 Find the best word (SALET) with proof, using original wordlists
wordle -c10 Find the best word (TARSE) with proof, using current wordlists
wordle -s -c10 Evaluate all words, using current wordlists
cp results_easy_nyt20220830 joblist
wordle -s -c2 -j joblist &
wordle -s -c2 -j joblist &
...
wordle -s -c2 -j joblist &
Example of parallelising the above using simple job allocation.
The file joblist will be consumed line-by-line from the end with a locking mechanism to prevent race conditions.
This order has the effect of evaluating worse and slower words first, which helps balance the work amongst the different cores, though may not be optimal from the point of view of the internal cache.
wordle -H -s -c10 Evaluate all words in hard mode, using current wordlists
wordle -w tarse -p tarse.strategy Calculate strategy file for the starting word TARSE.