The Lab / Field notes / Part 1 of 4

We pointed an AI at the Vatican's last unsolved cipher

Part one: in which we are very confident.

There is a particular kind of evening, usually a Tuesday, when the paid and sensible work is done and a reasonable person watches television, and we instead open a 500-year-old cipher that has defeated everyone who has ever looked at it.

We should say this plainly: we build websites and AI systems for businesses. We are not cryptographers. Our qualifications ranged from "owns a laptop" to "has strong opinions about Bletchley Park films". The cipher did not mind either way. The cipher has seen worse.

The mark

In 1539, Cardinal Alessandro Farnese, grandson of the Pope and at twenty-five already one of the most powerful men in Europe, was writing to the papal ambassador in Spain. The sensitive parts were encrypted: unbroken rivers of digits, thousands of them, with small marks over some numbers. The originals sit in the Vatican Secret Archives, a real place with a name so good it sounds like it was invented by a marketing agency, possibly ours.

In 2019 a cryptanalyst called George Lasry published this cipher as the final challenge in a series on MysteryTwister, a competitive playground for codebreakers, difficulty Level X, the rating reserved for problems where hope is considered poor form. Lasry deserves a sentence. He is one of the best historical codebreakers alive, works with the academic DECRYPT project and has broken ciphers that stood for centuries. His own published algorithm cracked batch after batch of Vatican ciphers. This one it did not, and he said so in print, which tells you something about him and more about the cipher.

Seven years online. Zero solves.

What is actually in the envelope

Here is what the thing physically is. Eight handwritten pages in the Vatican's files, folios 70 to 73 of a Spanish nunciature volume, carrying roughly 6,300 digits in a careful sixteenth-century hand. No spaces, no punctuation, the odd small mark over a number. Underneath the digits sits a diplomatic letter of several hundred words in Italian, written at a moment when Europe was holding its breath: the Emperor was about to ride through his great rival's France as an honoured guest, the Pope was trying to glue together a league against the Turk, and everyone was lying to everyone about all of it, in writing, weekly.

So the letter could be war. It could be money. It could be the Pope's unguarded assessment of the two most powerful men alive. It could, in fairness, be administrative tedium of the most crushing kind, because most diplomatic mail is. But somebody in 1539 looked at these particular paragraphs and decided they must not be read by the wrong eyes at any price, and five centuries later they still have not been. That is not nothing.

And sealed envelopes from this period have form. When the same community of codebreakers cracked the lost ciphers of Mary, Queen of Scots in 2023, out came more than fifty secret letters written while she plotted from an English prison, a genuine revision of the historical record, front pages everywhere. Other solved papal ciphers have yielded instructions to ambassadors, conclave intelligence and candid talk about money. The sixteenth century encrypted its best material.

Which made it the perfect test for a question we keep being asked in saner contexts: how good are these AI models, actually? Not on a benchmark designed by people with mortgages riding on the answer. On a sealed envelope from 1539. Our lab runs Claude, lately the newest Mythos-class model, against problems that are supposed to be too hard. Here was one with five centuries of provenance and a scoreboard.

The confident phase

What we had: the challenge transcription of those digits. The knowledge that the plaintext is Italian, because some passages were left unencrypted, the sixteenth-century equivalent of CCing everyone by accident. And a 2020 paper describing how this family of ciphers usually works: several codes per letter, padding digits called nulls sprinkled in to ruin frequency analysis, special codes for names.

The first sessions went the way these things go in films. Claude built frequency tables. It built a language model from three and a half million letters of Machiavelli and Dante. It ran simulated annealing, which is a respectable optimisation method and not, whatever the name implies, a spa treatment. At one point it tested all 362,880 ways of assigning six letters to six codes, a sentence that took forty minutes of compute and is the only thing those forty minutes ever produced.

And the discoveries arrived. A five-digit pattern repeating 26 times, surely the word "per". Proof that digit 7 was a word separator, because removing it gave an average word length of 5.0, exactly Italian's. A best-scoring key producing "con" 28 times. The progress reports wrote themselves, which should have been the clue, because that is precisely what they were doing.

It was all wrong. Every finding. Not wrong in an interesting way. Wrong the way a horoscope is wrong: produced by a system that wanted to find something, for an audience that wanted it found. When we finally asked the boring question, "would we see this even if our theory were nonsense?", everything dissolved. The famous pattern was no more frequent than 25 patterns nobody had romanticised. The separator trick worked for any common digit. The best key produced Italian at slightly below random chance, an achievement of a kind.

Five rounds of computation, zero true statements. And one lesson now printed on the inside of our eyelids: an AI given an exciting hypothesis behaves exactly like a human given an exciting hypothesis, except faster, with nicer formatting, and with no capacity for looking sheepish.

We had not cracked the cipher. We had cracked the assumption that intelligence was the bottleneck. The bottleneck is discipline. This becomes a theme.

Next: a 120-year-old German book outperforms the cloud.