frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

OCR for construction documents does not work, we fixed it

https://www.getanchorgrid.com/developer/docs/endpoints/drawings-doors
27•wcisco17•1h ago

Comments

wcisco17•1h ago
So we've build an API and trained models that detects fixtures, extracts schedules, and analyzes construction documents. Check us out!

More examples: - https://www.getanchorgrid.com/developer/docs/endpoints/drawi...

Main website: - https://www.getanchorgrid.com/developer

Why we did it: https://www.getanchorgrid.com/developer/docs/changelog/const...

fithisux•1h ago
Of course it is not working. PDF and images are supposed to be tamper resistant. OCR tries to reverse engineer them.
kube-system•50m ago
Since when is tamper resistance a part of PDF or any common image format?
pwagland•33m ago
PDF files can be signed, that is tamper resistance. Tamper resistance doesn't have to make any difference to the readability of the document.
kube-system•27m ago
So can any type of file -- that doesn't have any relevance to the supposed design of every file type in existence. Now, later versions of PDF do have explicit support for signatures, but what does this have to do with preventing OCR? OCR reads a file, it doesn't change the original file.
ranger_danger•10m ago
Some OCR solutions do change the original file, like OCRmyPDF. They take layers that were just images before and replace it with text layers so that you can search the document.
kube-system•2m ago
That isn't OCR, but an application of the resulting output of OCR. Again, a signature on a PDF or any type of file doesn't prevent you from reading it.
ranger_danger•12m ago
Can't one just remove the signature and re-sign it with anything else after tampering? Who verifies PDFs that hard?
achillesheels•53m ago
Love it! Starbucks Vente Machiato sip

Love to give it to an arc client, not sure who the right person to implement this would be? Hmm…

wcisco17•22m ago
Hey OP here - Love to help if you're looking for a team to implement a solution.

https://cal.com/anchorgrid/anchorgrid-external-meeting?durat...

Iulioh•45m ago
When will this be available for 30000x8000px electrical diagrams?

I have to make a BOM and oh boy I hate my job

jsidney•40m ago
What do you hate the most?
oritron•39m ago
What software made the bitmap? Seems like a step earlier in the pipeline could help generate a BOM more easily.
alexeischiopu•20m ago
I’m building a similar platform, with electrical being furthest ahead - SLD, panels, lights, power, comms.

Also do doors, windows, and mechanical equipment.

dm, and I can include you in the next preview.

alexeischiopu•20m ago
Good idea :)
wcisco17•15m ago
Thanks!!
vessenes•19m ago
cool. What's pricing like?
wcisco17•16m ago
Thanks! https://www.getanchorgrid.com/developer/pricing

Let me know if you find it useful or have any questions, happy to help.

vessenes•8m ago
Thanks -- btw the Pricing link on the site pulls up a form, not that page.
Terr_•16m ago
> OCR for construction documents does not work

I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]

It wasn't overt OCR per se, users weren't trying to convert pixels to characters.

[0] https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s

TehCorwiz•2m ago
If I recall it was an artifact of the compression algo.

Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

testUser1228•13m ago
What do you foresee being the end use case for this (or most valuable use case)?
wcisco17•6m ago
Anyone building in or for construction tech — whether that's a startup building estimating or project management software, a construction company with an internal tech team solving this themselves, or a builder looking to automate their workflow. The common thread is drawings. Every one of those groups lives and dies by their ability to extract actionable data from a PDF that was never designed to be machine-readable. We're building the layer that makes that possible so they don't have to start from scratch.
hspraggins77•3m ago
Great points raised!