I built a tool to redact sensitive information from PDFs:
https://redactanything.com The problem: Most "redaction" tools just draw black boxes over text. The text is still in the PDF and can be recovered with basic tools. Adobe's actual redaction works but it's manual and slow.
What this does:
- Upload a PDF (stays in your browser, never hits a server)
- AI detects PII: names, SSNs, emails, phone numbers, addresses, dates, etc.
- You review what it found and approve/reject
- Download a PDF where the text is permanently removed from the content stream
Technical details:
- Frontend: React + PDF.js for rendering + pdf-lib for manipulation
- NER model: Hugging Face Transformers (Xenova/bert-base-NER) running server-side
- OCR for scanned docs
- The actual redaction removes text operators from the PDF content stream, not just overlays
I built this because I needed to redact medical records and legal docs. Enterprise tools cost thousands, Adobe is $20/mo and manual, free tools are sketchy. Settled on $2.99/doc as a middle ground.
Limitations I'm aware of:
- AI detection isn't perfect (that's why there's a review step)
- Doesn't handle all PDF edge cases (encrypted, malformed, etc.)
- Names in non-Western formats need work
Would appreciate feedback on the detection accuracy and any edge cases you find.