I was taking a Signals and Systems course and filling notebooks with Laplace transforms and long derivations. Before finals I tried digitizing them so I could search my notes.
Everything failed.
Most OCR tools can recognize the characters, but they destroy the structure that makes math readable:
- aligned equations lose alignment - multi-step derivations collapse into paragraphs - numbered problems merge together - tables flatten into plain text
So I built *Axiom*.
Instead of focusing only on transcription accuracy, it focuses on *preserving mathematical structure*.
Upload a photo of handwritten STEM notes and it returns structured Markdown with real LaTeX — keeping aligned equations, derivation steps, and problem blocks intact.
Under the hood it’s basically:
image → vision model → structured Markdown + LaTeX → KaTeX render
Most of the work ended up being in *layout preservation*, not OCR.
https://www.useaxiomnotes.com/app
Happy to answer questions.
mrajatnath•2h ago
A few technical details about how this works.
Stack: - Next.js - Tailwind - KaTeX for rendering - Supabase storage - deployed on Vercel
The pipeline is roughly:
image → vision model → Markdown + LaTeX → custom renderer
The tricky part isn’t OCR itself — it's preserving structure.
Examples:
• consecutive equations with aligned `=` signs need to become a single `align` block • handwritten tables must be reconstructed from vertical alignment patterns • numbered problems must stay separate instead of merging
The system prompt ended up being ~300 lines mostly consisting of *negative constraints* like:
- don't simplify math - don't merge derivation steps - don't reorder columns
Without those rules the model constantly tries to "improve" the notes.
One surprising lesson: prompt engineering for OCR is very different from chat prompts — you want the model to be extremely literal.
Still working on better handling for diagrams and messy annotations.
Curious if anyone here has worked on *math layout detection or document AI*.