frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

OCR for construction documents does not work, we fixed it

https://www.getanchorgrid.com/developer/docs/endpoints/drawings-doors
46•wcisco17•2h ago
So we've built an API and trained models that detects fixtures, extracts schedules, and analyzes construction documents. Check us out!

More examples: - https://www.getanchorgrid.com/developer/docs/endpoints/drawi...

Main website: - https://www.getanchorgrid.com/developer

Why we did it: https://www.getanchorgrid.com/developer/docs/changelog/const...

Comments

fithisux•1h ago
Of course it is not working. PDF and images are supposed to be tamper resistant. OCR tries to reverse engineer them.
kube-system•1h ago
Since when is tamper resistance a part of PDF or any common image format?
pwagland•1h ago
PDF files can be signed, that is tamper resistance. Tamper resistance doesn't have to make any difference to the readability of the document.
kube-system•1h ago
So can any type of file -- that doesn't have any relevance to the supposed design of every file type in existence. Now, later versions of PDF do have explicit support for signatures, but what does this have to do with preventing OCR? OCR reads a file, it doesn't change the original file.
ranger_danger•1h ago
Some OCR solutions do change the original file, like OCRmyPDF. They take layers that were just images before and replace it with text layers so that you can search the document.
kube-system•53m ago
That isn't OCR, but an application of the resulting output of OCR. Again, a signature on a PDF or any type of file doesn't prevent you from reading it. (It also doesn't technically prevent you from changing it, it just enables the detection of changes to a particular file.)

There's nothing about PDFs or image formats that prevent anyone from doing OCR. The reason construction documents are difficult to OCR is because OCR models are not well trained for them, and they're very technical documents where small details are significant. It doesn't have anything to do with the file format

fithisux•41m ago
True but you can make modified copies if you reverse engineer it with OCR.
ranger_danger•1h ago
Can't one just remove the signature and re-sign it with anything else after tampering? Who verifies PDFs that hard?
kube-system•46m ago
If you're performing OCR, you're almost by definition, disregarding the source file. The whole point of OCR is to be transformative.
fithisux•42m ago
You can't change a PDF, it is by design to be not easy to OCRed
achillesheels•1h ago
Love it! Starbucks Vente Machiato sip

Love to give it to an arc client, not sure who the right person to implement this would be? Hmm…

wcisco17•1h ago
Hey OP here - Love to help if you're looking for a team to implement a solution.

https://cal.com/anchorgrid/anchorgrid-external-meeting?durat...

Iulioh•1h ago
When will this be available for 30000x8000px electrical diagrams?

I have to make a BOM and oh boy I hate my job

jsidney•1h ago
What do you hate the most?
oritron•1h ago
What software made the bitmap? Seems like a step earlier in the pipeline could help generate a BOM more easily.
Iulioh•48m ago
I'm not really sure and I don't have access to it, I just recive flat PDFs or TIFFs

A lot of them are "archival" so I'm pretty OOL

alexeischiopu•1h ago
I’m building a similar platform, with electrical being furthest ahead - SLD, panels, lights, power, comms.

Also do doors, windows, and mechanical equipment.

dm, and I can include you in the next preview.

Iulioh•46m ago
I work in the automotive field, I don't know if this complicates the things further but I appreciate any help!
alexeischiopu•1h ago
Good idea :)
wcisco17•1h ago
Thanks!!
vessenes•1h ago
cool. What's pricing like?
wcisco17•1h ago
Thanks! https://www.getanchorgrid.com/developer/pricing

Let me know if you find it useful or have any questions, happy to help.

vessenes•59m ago
Thanks -- btw the Pricing link on the site pulls up a form, not that page.
Terr_•1h ago
> OCR for construction documents does not work

I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]

It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.

[0] https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s

TehCorwiz•53m ago
If I recall it was an artifact of the compression algo.

Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

testUser1228•1h ago
What do you foresee being the end use case for this (or most valuable use case)?
wcisco17•57m ago
Anyone building in or for construction tech — whether that's a startup building estimating or project management software, a construction company with an internal tech team solving this themselves, or a builder looking to automate their workflow. The common thread is drawings. Every one of those groups lives and dies by their ability to extract actionable data from a PDF that was never designed to be machine-readable. We're building the layer that makes that possible so they don't have to start from scratch.
wang_li•46m ago
Why does the workflow lie at the level of a real or virtual piece of paper and not in the metadata from the applications used to create that piece of paper? Seems like a CAD tool would allow you to identify each element of the drawing, assigning metadata as required.
jsidney•37m ago
Only a small set of construction stakeholders participate in the CAD ecosystem (e.g., architects, large GCs) while a broader set of stakeholders (subcontractors, trades, smaller GCs/CMs) do not receive BIM files and work with PDFs. CAD/BIM is a wonderful aspiration but for many the reality is PDFs.
cyanydeez•30m ago
Oh you sweet summer child. These draws are anywhere from 0 to 120 years old and might just be something pulled out of a floppy disk from 1970 to scanned in coffee ridden pieces of paper sitting in a desk folded a hundred times.

The world in which metadata is a common thing attached to any file doesn't exist, and probably never will, no matter how much you try to improve CAD work flow.

hspraggins77•54m ago
Great points raised!
ware-intel•37m ago
Your smart features looks like a game changer? Nice job!
frogguy•27m ago
Looks cool! Where are you getting the data to finetune the cv models for element extraction? I'm worried there isn't a robust enough dataset to be able to build a detection model that will generalize to all of the slightly different standards each discipline (and each firm for that matter) use.

How to turn anything into a router

https://nbailey.ca/post/router/
399•yabones•5h ago•154 comments

"CEO Said a Thing " Journalism

https://karlbode.com/ceo-said-a-thing-journalism/
51•LordAtlas•29m ago•5 comments

Take better notes, by hand

https://brianschrader.com/archive/take-better-notes-by-hand/
84•sonicrocketman•2h ago•37 comments

Bird brains (2023)

https://www.dhanishsemar.com/writing/bird-brains
233•DiffTheEnder•5h ago•147 comments

CodingFont: A game to help you pick a coding font

https://www.codingfont.com/
151•nvahalik•3h ago•88 comments

Cherri – programming language that compiles to an Apple Shortuct

https://github.com/electrikmilk/cherri
80•mihau•2d ago•6 comments

New Washington state law bans noncompete agreements

https://www.seattletimes.com/business/local-business/new-washington-law-bans-noncompete-agreements/
159•toomuchtodo•1h ago•47 comments

OCR for construction documents does not work, we fixed it

https://www.getanchorgrid.com/developer/docs/endpoints/drawings-doors
47•wcisco17•2h ago•33 comments

Build123d: A Python CAD programming library

https://github.com/gumyr/build123d
66•Ivoah•21h ago•21 comments

FTC action against Match and OkCupid for deceiving users, sharing personal data

https://www.ftc.gov/news-events/news/press-releases/2026/03/ftc-takes-action-against-match-okcupi...
142•gnabgib•3h ago•58 comments

Mathematical methods and human thought in the age of AI

https://arxiv.org/abs/2603.26524
160•zaikunzhang•7h ago•59 comments

An NSFW filter for Marginalia search

https://www.marginalia.nu/log/a_134_nsfw/
33•speckx•2h ago•2 comments

ChatGPT won't let you type until Cloudflare reads your React state

https://www.buchodi.com/chatgpt-wont-let-you-type-until-cloudflare-reads-your-react-state-i-decry...
915•alberto-m•22h ago•587 comments

From Proxmox to FreeBSD and Sylve in Our Office Lab

https://www.iptechnics.com/blogs/from-proxmox-to-freebsd-and-sylve-in-our-office-lab
51•arch1e•2d ago•39 comments

You are falling behind because you haven't fed the insincerity machine

https://christianheilmann.com/2026/03/28/you-are-falling-behind-because-you-havent-fed-the-insinc...
67•speckx•1h ago•9 comments

The curious case of retro demo scene graphics

https://www.datagubbe.se/aipixels/
316•zdw•13h ago•78 comments

I am definitely missing the pre-AI writing era

https://www.lesswrong.com/posts/BJ4pnropWdnzzgeJc/i-am-definitely-missing-the-pre-ai-writing-era
157•joozio•11h ago•136 comments

I use Excalidraw to manage my diagrams for my blog

https://blog.lysk.tech/excalidraw-frame-export/
223•mlysk•11h ago•96 comments

Proactively Parasocial

https://nicklandolfi.com/posts/proactively-parasocial.html
11•jxmorris12•4d ago•1 comments

Show HN: Coasts – Containerized Hosts for Agents

https://github.com/coast-guard/coasts
18•jsunderland323•3h ago•6 comments

Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models

https://dani2442.github.io/posts/continuous-rl/
131•sebzuddas•11h ago•38 comments

Voyager 1 runs on 69 KB of memory and an 8-track tape recorder

https://techfixated.com/a-1977-time-capsule-voyager-1-runs-on-69-kb-of-memory-and-an-8-track-tape...
656•speckx•1d ago•241 comments

Comprehensive C++ Hashmap Benchmarks (2022)

https://martin.ankerl.com/2022/08/27/hashmap-bench-01/
53•klaussilveira•5d ago•15 comments

Copilot edited an ad into my PR

https://notes.zachmanson.com/copilot-edited-an-ad-into-my-pr/
1316•pavo-etc•14h ago•530 comments

VHDL's Crown Jewel

https://www.sigasi.com/opinion/jan/vhdls-crown-jewel/
133•cokernel_hacker•14h ago•46 comments

DigitalOcean Seeks $800M in Funding

https://www.datacenterdynamics.com/en/news/digitalocean-seeks-800m-in-funding/
29•herbertl•1h ago•13 comments

The ladder is missing rungs – Engineering Progression When AI Ate the Middle

https://negroniventurestudios.com/2026/03/19/the-ladder-is-missing-rungs/
19•sorenvrist•4h ago•1 comments

15 Years of Forking

https://www.waterfox.com/blog/15-years-of-forking/
284•MrAlex94•3d ago•59 comments

Ninja is a small build system with a focus on speed

https://github.com/ninja-build/ninja
86•tosh•3d ago•47 comments

C++26 is done: ISO C++ standards meeting Trip Report

https://herbsutter.com/2026/03/29/c26-is-done-trip-report-march-2026-iso-c-standards-meeting-lond...
303•pjmlp•1d ago•344 comments