frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Best on device LLM tooling for PDFs?

4•martinald•7mo ago
I've got very used to using the "big" LLMs for analysing PDFs

Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.

The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.

Given this is so new I'm struggling to find any tools which make this easier.

Comments

raymond_goo•7mo ago
Try something like this

  !pip install pytesseract pdf2image pillow
  !apt install poppler-utils
  #!apt install tesseract-ocr
  from pdf2image import convert_from_path
  import pytesseract

  pages = convert_from_path('k.pdf', dpi=300)

  all_text = ""
  for page_num, img in enumerate(pages, start=1):
      text = pytesseract.image_to_string(img)
      all_text += f"\n--- Page {page_num} ---\n{text}"

  print(all_text)
constantinum•7mo ago
give https://pg.llmwhisperer.unstract.com/ a try

Tell HN: HN was down

412•uyzstvqs•6h ago•259 comments

Ask HN: What Are You Working On? (December 2025)

433•david927•3d ago•1415 comments

Tell HN: AI coding is sexy, but accounting is the real low-hanging target

59•bmadduma•6d ago•54 comments

Ask HN: Was HN just down for anyone else?

83•rozenmd•6h ago•2 comments

Ask HN: Is building a calm, non-gamified learning app a mistake?

86•hussein-khalil•2d ago•121 comments

Ask HN: Is starting a personal blog still worth it in the age of AI?

59•nazarh•2d ago•74 comments

Ask HN: What are your predictions for 2026?

19•mfrw•20h ago•11 comments

Computer animator and Amiga fanatic Dick van Dyke turns 100

278•ggm•4d ago•92 comments

Ask HN: How are you vibe coding in an established code base?

10•adam_gyroscope•1d ago•6 comments

Memory Safety in C# vs. Rust

13•northlondoner•1d ago•12 comments

Ask HN: How can I get better at using AI for programming?

464•lemonlime227•4d ago•464 comments

Tell HN: HP Smart Printers

2•_RPM•9h ago•3 comments

Ask HN: Claude Opus 4.5 vs. GPT 5.1 Codex Max for coding. Worth the upgrade?

3•terabytest•1d ago•5 comments

Ask HN: How do you know what you're working on is worth working on?

8•ideavo•1d ago•15 comments

Who has enjoyed using PR code reviewers? What worked and what didn’t?

3•yashwantphogat•1d ago•2 comments

Ask HN: How do you learn marketing as a developer? It's so different from coding

6•Gooblebrai•13h ago•4 comments

Ask HN: Bloggers, how do you manage your content?

9•freemanjiang•2d ago•13 comments

Ask HN: Did anyone else notice that the OpenAI Labs website was completely gone?

26•underlipton•5d ago•9 comments

Ask HN: Best back end to run models on Google TPU?

8•vood•2d ago•0 comments

Ask HN: Thought-Provoking Books

18•Agraillo•4d ago•17 comments

Ask HN: How do you get comfortable with shipping code you haven't reviewed?

7•fnimick•2d ago•11 comments

Ask HN: Why are modern AIs ignorant or reluctant to talk about "vibe coding"?

2•amichail•2d ago•16 comments

Ask HN: How do I navigate horror of requirement gathering in product management?

5•souravpradhan•3d ago•5 comments

Our "enterprise" experience with Stripe after $1B+ processed (be careful)

28•Boulderchaim•5d ago•14 comments

Ask HN: Any online tech spaces you hang around that don't involve AI?

12•jc_811•4d ago•9 comments

Rkik v2.0.0 – NTP, NTS, PTP diagnostics, presets and config, Docker test lab

3•aguacero7•1d ago•1 comments

You've reached the end!