The current generation of frontier LLMs can't make puzzles that get much more interesting than hot-vs-big-vs-fast. New inferences keep circling a small pool of concepts unless the prompting has a way to get the LLM into new territories of language. Puzzlemaking needs graph traversals.
I made 20 levels algorithmically because I work with a huge semantic graph with over 100M edges, built from manual lexicography and millions of LLM inferences (various models). I keep exploring what can emerge from this graph. The puzzles are randomly selected; reload to see others.
The front-end was built with Claude Code.
Maybe someday I'll make this into a mobile game, increase the complexity and peril. If you are a gamedev, feel free to dissect it and borrow any parts.
sxp•1h ago
But I think you need to work on better "opposites". E.g, "as of now" and "long ago" don't really seem to be opposite. Instead, maybe they're complex conjugates of each other. I.e, they have similarities along one axis (time frame) and differences along another. But I wouldn't consider those two to be opposite one another. Word2vec with a consine similarity closer to -1 might be better than what you're using now.