Voynich Manuscript decoded – 87.8% of tokens as bilingual Latin-Occitan

3•scott-schechter•1h ago

Comments

scott-schechter•1h ago

The Voynich Manuscript (Beinecke MS 408, Yale) has never been deciphered. I believe I've identified the cipher as a positional homophonic substitution encoding bilingual Latin-Occitan pharmaceutical text.

87.8% of 37,886 tokens decode through a 3,648-entry glossary (2.1% on random input, 42x ratio). The Latin verb NOCERE produces 18 correctly conjugated forms across five tenses. The decoded vocabulary follows Zipf's law at -0.919.

The manuscript appears to be an oil-based pharmaceutical manual. Evidence points to a Jewish artisan apothecary working in the 13th-century Montpellier medical tradition. Zero Christian vocabulary across 117 tested terms. The divine vocabulary distributes according to Kabbalistic sefirotic structure (permutation test: p < 0.0001, combined 1 in 160,000).

There's an unsolved layer: each page contains balanced Latin grammar (223/224 folios self-contained), but the decoded text reads as fragmented word salad. Assembling of random words picked by grammatical category produces coherent pharmaceutical statements. The word substitution cipher appears to work. The page-level reading order still remains unsolved.

Everything is open source. Run `node decode.js` and test it yourself. Criticism and feedback welcome.

https://github.com/scott-schechter/voynich-decoded