frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Transform DOCX into LLM-ready data

https://contextgem.dev/converters/docx.html
15•sergiishcherbak•9mo ago

Comments

sergiishcherbak•9mo ago
As part of work on my open-source project ContextGem, I've built a native, zero-dependency DOCX converter that transforms Word documents into LLM-ready data.

This custom-built converter directly processes Word XML, provides comprehensive content extraction + covers what other open-source tools often miss or lack support for:

- Rich paragraph and sentence metadata for enhanced context

- Misaligned tables

- Comments, footnotes, and textboxes

- Embedded images

The converted document can then be easily used in ContextGem's LLM extraction workflows.

Perfect for developers building contract intelligence applications where precision matters. The converter preserves document structure and relationships, empowering LLMs to better understand and analyze document content.

Try it / share with your dev team today and see the difference in your document processing pipeline!

GitHub: https://github.com/shcherbak-ai/contextgem

All DocxConverter features: https://contextgem.dev/converters/docx.html

WalterGR•9mo ago
zero-dependency DOCX converter

I’ve read that there are a lot of OpenXML elements that are pretty opaque. They appear to basically be XML-esque representations of binary, in-memory structs used internally by Office. (Maybe this has changed over time.)

How much OpenXML does this actually handle?

Extracts information that other open-source tools often do not capture: misaligned tables

Could you expand on what you mean by misaligned tables? Are these tables that appear as separate ‘table nodes’ in the XML, or ones that appear as a single node but have wonky formatting?

obeavs•9mo ago
Hey! This is really awesome. Do you intend to support analysis on redlining/tracked changes? That's where it would become very useful for my use cases.
eightysixfour•9mo ago
Yes, this is the one that always gets me in the MS ecosystem. Would make a few of my workflows so much better.
TiredOfLife•9mo ago
How it compares to https://github.com/microsoft/markitdown?

France's homegrown open source online office suite

https://github.com/suitenumerique
134•nar001•1h ago•72 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
357•theblazehen•2d ago•122 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
54•AlexeyBrin•3h ago•11 comments

Google staff call for firm to cut ties with ICE

https://www.bbc.com/news/articles/cvgjg98vmzjo
20•tartoran•8m ago•1 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
739•klaussilveira•17h ago•232 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
30•onurkanbkrc•2h ago•2 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
92•alainrk•2h ago•88 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
994•xnx•23h ago•564 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
126•jesperordrup•7h ago•55 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
86•videotopia•4d ago•18 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
27•matt_d•3d ago•5 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
145•matheusalmeida•2d ago•39 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
8•sandGorgon•2d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
250•isitcontent•17h ago•27 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
260•dmpetrov•18h ago•139 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
6•andmarios•4d ago•1 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
403•ostacke•23h ago•104 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
351•vecti•20h ago•157 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
524•todsacerdoti•1d ago•253 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
320•eljojo•20h ago•196 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
52•helloplanets•4d ago•52 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
365•aktau•1d ago•189 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
446•lstoll•1d ago•294 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
99•quibono•4d ago•26 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
288•i5heu•20h ago•245 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
48•gmays•12h ago•22 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
26•bikenaga•3d ago•15 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
163•vmatsiiako•22h ago•74 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1100•cdrnsf•1d ago•483 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
79•kmm•5d ago•13 comments