frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Best on device LLM tooling for PDFs?

4•martinald•7mo ago
I've got very used to using the "big" LLMs for analysing PDFs

Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.

The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.

Given this is so new I'm struggling to find any tools which make this easier.

Comments

raymond_goo•7mo ago
Try something like this

  !pip install pytesseract pdf2image pillow
  !apt install poppler-utils
  #!apt install tesseract-ocr
  from pdf2image import convert_from_path
  import pytesseract

  pages = convert_from_path('k.pdf', dpi=300)

  all_text = ""
  for page_num, img in enumerate(pages, start=1):
      text = pytesseract.image_to_string(img)
      all_text += f"\n--- Page {page_num} ---\n{text}"

  print(all_text)
constantinum•7mo ago
give https://pg.llmwhisperer.unstract.com/ a try

Book recommendations based on reading history

2•easywood•58m ago•3 comments

Ask HN: Resources to get better at outbound sales?

229•sieep•1w ago•65 comments

Ask HN: What did you read in 2025?

309•kwar13•2d ago•413 comments

Tell HN: Google ignores English searches and forces localized results

52•jeanlucas•1h ago•57 comments

Ask HN: What skills do you want to develop or improve in 2026?

255•meridion•3d ago•394 comments

Ask HN: How do you get visibility if you're suuuuper bad at marketing?

5•ClipNoteBook•4h ago•4 comments

Ask HN: What are you building during the holiday break?

5•linsomniac•4h ago•4 comments

Tell HN: Merry Christmas

1946•basilikum•3d ago•427 comments

Ask HN: Best Podcasts of 2025?

37•adriancooney•5h ago•45 comments

Ask HN: How are you sandboxing coding agents?

43•m-hodges•1d ago•29 comments

Ask HN: What are the best engineering blogs with real-world depth?

460•nishilpatel•5d ago•136 comments

Tell HN: I am afraid AI will take my job at some point

18•funnyfoobar•1d ago•30 comments

The Epstein files downloaded today is different compared to before

47•IDKhowTo•1d ago•8 comments

Ask HN: What was the hardest bug you tracked down in 2025?

8•varshith17•1d ago•4 comments

Ask HN: Why isn't there competition to LinkedIn yet?

59•antfie•5d ago•59 comments

Tell HN: Merry Christmas

92•franze•4d ago•57 comments

Do you know what your dev team shipped last week?

2•akhnid•23h ago•1 comments

Bloat in software is getting WAAAY out of hand

8•sdrawkcabsti•1d ago•8 comments

Looking for Decent Conversation?

101•kmstout•4d ago•16 comments

Ask HN: Anti-AI Open Source License?

40•W-Stool•6h ago•86 comments

Ask HN: How many HN'ers Celebrate Christmas vs. ?

19•gist•3d ago•35 comments

Ask HN: What is the international distribution/statistics of HN visitors?

62•KellyCriterion•3d ago•28 comments

Ask HN: Would anyone pay for a social network with no ads or data harvesting?

5•neilfd•1d ago•21 comments

Postgres for everything, does it work?

7•saisrirampur•1d ago•5 comments

Stronk.app – open-source gym lifts journal

63•apatheticonion•4d ago•29 comments

Ask HN: Good uses cases for Fabrice's microquickjs

14•fud101•3d ago•5 comments

Ask HN: What developer tool do you wish existed in 2026?

22•allenleee•1w ago•24 comments

Ask HN: My mother was scammed out of all her savings. What should I do?

135•scapbi•6d ago•66 comments

Google Cloud Run cost me $4,676 in 6 weeks with zero traff

50•creativesage•4d ago•33 comments

Ask HN: Oberon et al., vs. Rust

17•mikethe•6d ago•30 comments