frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: I am building a map of people who lived in the Roman Empire

https://new.roman-names.com/
59•metiscus•2d ago
Driving home from work one day, I wanted to know how many people we knew the names of who lived during the Roman era. Searching around, I found lists of Consuls and officials, but nothing that covered ordinary people or even most people like freedmen and slaves. So I ended up building a pipeline to process the more than 500k Latin inscriptions in the Epigraphic Database Clauss-Slaby https://edcs.hist.uzh.ch/en/ and extract the names of people (and attempt to cluster them, but this is a work in progress).

There are databases where Classicists have done this manually for specific regions, Trismegistos https://www.trismegistos.org/ and Latin Inscriptions of the Roman Empire (LIRE) https://pure.au.dk/portal/en/publications/latin-inscriptions... are two major efforts I found. But there doesn't seem to be a project that did what I set out to do, although I have read in some places that it was believed to be possible.

I am not a classicist or a web developer, but I have Claude and Gemini and I can sort of read basic Latin - so I set to work. I used LIRE and another database as ground truth and built a pipeline to extract and process the inscriptions to recover the names. The process I developed uses a high end LLM like Sonnet or Gemini Pro to supervise the extraction and tuning process on a regional basis until the obvious error rate is reasonable. For this, so far, reasonable to me means less than 1-2% in the smaller initial samples of 100-500 and no observed systemic issues. The different regions often need different prompts, so this basically became an exercise in letting the higher level AI tune the prompt for the lower level AI. The extraction when measured against LIRE produces an F1 score between 0.64 and 0.87, but take this with a grain of salt.

Once I had done a few regions, I wanted to see the work, so I threw together a pretty crude website but as I am not a web developer, it was crude in how it accessed its data. It does look cool and I also added summarization, and machine translation to each entry. I wanted to eventually get feedback from an actual team of classicists and make the website work better, so I am rewriting it as we speak but it is broadly functional now with a few extra bugs but substantially improved performance compared to the old one. All entries link back to the proper sources, and the old web app linked to several additional sources where the data was present, but I haven't gotten that working again just yet on the new one. (The old web interface is still available at https://roman-names.com, but I will warn you it is clunky and not mobile friendly at all)

Key findings so far:

AI supervised AI extraction saved me time. I was manually tuning things for a while and then the runbook became an idea that I feed my instructions in and let the big AI go with sparse oversight from me.

The extraction improved significantly (by about 10 F1 points) when I fed the model the raw text including the markers, vs a cleaned up version of the text.

I just thought it was a cool little project and wanted to share. If you happen to work in any adjacent space and there is something I could do better etc let me know.

Comments

metiscus•1d ago
I am hoping to push a few fixes to the new web interface later today, so if you looked at this and saw anything off, hopefully by COB today I will have the known issues fixed.
oezi•1h ago
Very nice. Are you using the roman roads which were also on HN a couple of weeks ago?
countrymile•55m ago
There's a lot of new roads being mapped in England. Interesting how the inscriptions on the map are often between roads, suggesting an unmapped pathway:

https://news.ycombinator.com/item?id=44622543

jdthedisciple•1h ago
Very nice idea but please, check the performance: I had to close the tab 3 seconds in because it got stuck and my CPU fan got noisy.

So I couldn't even check it out properly.

andai•1h ago
Could you elaborate on the multi level LLM workflow? Did you set up a benchmark, and you're having a LLM mutate prompts?
jnovek•1h ago
I guess before I roll out questions and criticisms, I just want to say that this is a really cool project. I love it.

Could you make the dots smaller in the updated UI? I didn’t realize at first that you were using an actual map of Roman provinces.

My eyesight isn’t great and it would help if you used a political map rather than terrain. I’m not sure what’s out there for ancient Roman map tiles, though.

I’m not so much of an antiquity scholar AND I’m an American so my European geography isn’t perfect. It would be neat to be able to flip to a modern map, too, so I can see where things are in terms of modern landmarks.

You’re not getting a ton of comments so far, but FWIW these are the kinds of projects I come to HN for. I’ve been getting into opera lately and suddenly classical antiquity is very relevant to my interests. I’m going to keep this in my bookmarks, I’m finding the tangential historical stuff related to opera is drawing me in nearly as much as the music.

I’m also going to pass it on to an academic friend of mine who is working in an unrelated field but might find similar techniques useful.

Finally, when I first opened the map, I recognized the basic shape of the peak Roman Empire in the dots! I love when data does that kind of thing.

Thank you again for sharing this very cool project.

daviTeodoro•1h ago
Very cool! Do you plan to share the final dataset? I've been working with geographic data all my life and I'm building a Carto/Felt alternative. Do you want to have your data their? It is https://cartografo.io/ There is a price tag, but I can host this dataset for free for you. I would love to have this map there to show case. If you are interested send an email, davi@cartografo.io. If you just need some help improving your map I can help you as well.
trevoragilbert•1h ago
This is very cool! For the name extraction, how are you handling false positives across such a large dataset? I’m assuming there are mentions that could be a name but are actually just a noun. For example, Agricola being the word for farmer but also a name.
ingvay7•51m ago
Love this. For people who aren’t familiar with Roman history, it would be great to have a short guided tour of how to explore the map. I filtered for 'pompeii' and it gave me 117 dots.
avyeed_desa•36m ago
Congrats to this great idea! Love it.

The ones around my place all use EDH, which also has a map feature, but not as intuitive as this! Reminds me of vici.org

aspenmartin•32m ago
This is really wonderful -- One thing that may be really cool if you have the data is to add a time-axis ability (unless I missed it) for a given location. This is such a delightful application of AI!
frereubu•27m ago
This is great. One little bit of UI feedback: the green map clusters when zoomed in quite a bit aren't very obvious on green backgrounds - they merge into the background features a bit (e.g. in the very west of Scotland).
yubblegum•23m ago
> I have Claude

And just now I am watching I, Claudius.

cwnyth•6m ago
Hi, I'm an actual classicist (phd and all), if you wanted to throw any questions my way.

US bans differential privacy in Census data

https://desfontain.es/blog/banning-noise.html
214•nl•2h ago•77 comments

Treating pancreatic tumours may have revealed cancer's master switch

https://economist.com/science-and-technology/2026/06/12/treating-pancreatic-tumours-may-have-reve...
64•andsoitis•2h ago•11 comments

Orthodox C++

https://bkaradzic.github.io/posts/orthodoxc++/
36•signa11•2h ago•16 comments

AI OSS tool repo goes archived over night after raising $7.3M Seed

https://github.com/tensorzero/tensorzero
157•hek2sch•4h ago•103 comments

Every Frame Perfect

https://tonsky.me/blog/every-frame-perfect/
120•ravenical•4h ago•20 comments

Introduction to the experience of rendering Arabic typography&its technical debt

https://lr0.org/blog/p/arabic/
57•bookofjoe•3h ago•8 comments

A low-carbon computing platform from your retired phones

https://research.google/blog/a-low-carbon-computing-platform-from-your-retired-phones/
160•vikas-sharma•6h ago•77 comments

Show HN: I am building a map of people who lived in the Roman Empire

https://new.roman-names.com/
60•metiscus•2d ago•14 comments

The state of building user interfaces in Rust

https://areweguiyet.com/#ecosystem
104•mahirsaid•2d ago•67 comments

Appreciating Exif

https://brentfitzgerald.com/posts/appreciating-exif/
15•burnto•3d ago•0 comments

Electric motors with no rare earths

https://www.renaultgroup.com/en/magazine/energy-and-powertrains/all-about-electric-motors-with-no...
625•bestouff•18h ago•178 comments

An Interview with Intel's Kira Boyko: Xeon 6's Product Director

https://chipsandcheese.com/p/an-interview-with-intels-kira-boyko
35•lumpa•4h ago•1 comments

CRISPR tech selectively shreds cancer cells, including "undruggable" cancers

https://innovativegenomics.org/news/crispr-technique-selectively-shreds-cancer-cells/
927•gmays•1d ago•203 comments

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

https://imil.net/blog/posts/2026/rtx-5080-+-rtx-3090-setup-80+-tok-s-on-qwen-3.6-27b-q8/
63•iMil•6h ago•20 comments

Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages

https://www.phoronix.com/news/Arch-Linux-AUR-More-Than-1500
176•qwertox•4h ago•91 comments

Statement on US government directive to suspend access to Fable 5 and Mythos 5

https://www.anthropic.com/news/fable-mythos-access
2871•Dylan1312•15h ago•2097 comments

Show HN: Paca – Lightweight Jira alternative for human-AI collaboration

https://github.com/Paca-AI/paca
83•pikann22•6h ago•29 comments

Open source AI must win

https://opensourceaimustwin.com/?share=v2
1330•vednig•14h ago•414 comments

Show HN: 2 Weeks of Hallucinate – The Photo Gallery

https://hallucinate.site/gallery
54•stagas•4h ago•15 comments

How to setup a local coding agent on macOS

https://ikyle.me/blog/2026/how-to-setup-a-local-coding-agent-on-macos
436•kkm•22h ago•110 comments

The computer science degree isn’t dead

https://spectrum.ieee.org/computer-science-degree-isnt-dead
162•jnord•3d ago•157 comments

Shepherd's Dog: A Game by the Most Dangerous AI Model

https://koenvangilst.nl/lab/claude-fable-shepherds-dog
138•vnglst•10h ago•109 comments

Show HN: Putt.day a daily mini golf game

https://putt.day/
251•ellg•17h ago•98 comments

Leaving Mozilla

https://blog.unitedheroes.net/5751
394•martey•10h ago•228 comments

There is a shadow hanging over this Fable thing

https://12gramsofcarbon.com/p/tech-things-there-is-a-massive-shadow
414•theahura•11h ago•387 comments

Malware developers added nuclear and biological weapons text to to their spyware

https://twitter.com/jsrailton/status/2064661778978533571
432•marc__1•1d ago•228 comments

Twenty One Zero-Days in FFmpeg

https://depthfirst.com/research/21-zero-days-in-ffmpeg
265•redbell•18h ago•173 comments

Swift at Apple: Migrating the TrueType hinting interpreter

https://www.swift.org/blog/migrating-truetype-hinting-to-swift/
228•DASD•20h ago•109 comments

H.R. 6028 would fundamentally change the U.S. Copyright Office

https://www.eff.org/deeplinks/2026/06/congress-just-rushed-through-disastrous-copyright-office-ov...
259•Cider9986•2d ago•102 comments

Sam Bankman-Fried loses bid to appeal against fraud conviction in FTX case

https://www.theguardian.com/business/2026/jun/12/sam-bankman-fried-loses-appeal
57•pseudolus•4h ago•35 comments