frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Ask HN: How should I convert Microsoft Word documents to Markdown?

4•lkrubner•3h ago
I took over a project that was built by an overseas team. They set up a data ingestion process. They have a step in the ingestion where they use Libre Office (in headless mode) to convert Microsoft Word documents to PDFs. Later we convert all PDFs to Markdown. They felt that it was best to convert everything to a PDF, and then convert all of the PDFs to Markdown.

What I notice is that LibreOffice can create very complex PDFs when the Microsoft Word document has:

1. tables

2. multiple columns

3. strikethrough text

I am thinking we should go straight from Microsoft Word to Markdown.

What is the right software for that?

Comments

ramoz•3h ago
Pandoc might be able to do this, found this:

https://gist.github.com/plembo/409a8d7b1bae66622dbcd26337bbb...

snailshare•3h ago
Pandoc can do this I think
kha1n3vol3•3h ago
Start with pandoc before reinventing the wheel.
verdverm•24m ago
Native support: https://techcommunity.microsoft.com/blog/onedriveblog/introd...

Microsoft OSS python: https://github.com/microsoft/markitdown

There seem to be many addons that enable this, and pandoc as others have suggested

Tell HN: Happy Fathers Day

239•consumer451•6h ago•31 comments

Ask HN: How should I convert Microsoft Word documents to Markdown?

4•lkrubner•3h ago•4 comments

Ask HN: Will programmers write more efficient code during the memory shortage?

147•amichail•2d ago•239 comments

Ask HN: Are You a Workaholic?

4•julienreszka•11h ago•0 comments

Norrin – Git/ diff control in Claude Code

3•gagewoodard•9h ago•1 comments

Ask HN: Is anyone using the A2A protocol?

92•asim•3d ago•42 comments

Ask HN: What tools are you using for AI-assisted code review?

23•agos•3d ago•25 comments

Ask HN: Favorite aspects of Cocoa/NeXTSTEP for app dev?

5•elcritch•16h ago•0 comments

Ask HN: What are your favourite Hacker News comments?

4•Imustaskforhelp•16h ago•4 comments

Ask HN: After you ship a feature, what happens to what you learned?

10•gaggle_dk•1d ago•11 comments

Ask HN: How to get ideas for space startups?

5•asxndu•19h ago•5 comments

Ask HN: What technique do you use to make Claude Code deterministic?

6•hbarka•1d ago•9 comments

Ask HN: What do you care about? What is your joy and purpose?

9•bix6•1d ago•20 comments

Ask HN: Do you give AI coding agents their own GitHub account?

5•ahmd•10h ago•4 comments

Ask HN: What is your #1 practical lesson or "aha" moment from coding with AI?

9•johndavid9991•1d ago•15 comments

Ask HN: Do you use Claude Code, Codex, or something else?

8•JohnDSDev•1d ago•23 comments

Tell HN: Happy Father's Day

10•atestu•12h ago•5 comments

Forked CozoDB to give agents cognitive primitives

3•shanrizvi•1d ago•0 comments

Ask HN: How to handle kernel struct changes (e.g. iov_iter) in eBPF?

3•morolis•1d ago•2 comments

Ask HN: Need advice on distributing and testing what I build

5•darth-pixit•1d ago•2 comments

Ask HN: What is the coolest tech progress outside AI?

15•vantareed•2d ago•9 comments

Ask HN: Is anyone else leaving AUR?

8•lordkrandel•2d ago•7 comments

Ask HN: If AI didn't exist, what would you be building today?

6•akashwadhwani35•18h ago•10 comments

Ask HN: Is Claude Code with Fable 5 worth switching back from Codex?

6•vantareed•20h ago•3 comments

Ask HN: I'm lost. How can I define ICP (Ideal Customer Profile)?

5•snowhy•3d ago•6 comments

Ask HN: Do you find vibe coding / agentic engineering to be fulfilling?

12•uejfiweun•3d ago•13 comments

Ask HN: Is there a way to stop the animated Google Doodles?

12•arnejenssen•3d ago•13 comments

Anthropic pauses credit change for Claude Code

36•fabianlindfors•6d ago•12 comments

Reviews have become expensive, rewrites have become cheap

82•_z6bq•5d ago•74 comments

Ask HN: How do you effectively communicate or present?

10•hnthrow10282910•3d ago•9 comments