frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Best on device LLM tooling for PDFs?

4•martinald•11mo ago
I've got very used to using the "big" LLMs for analysing PDFs

Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.

The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.

Given this is so new I'm struggling to find any tools which make this easier.

Comments

raymond_goo•11mo ago
Try something like this

  !pip install pytesseract pdf2image pillow
  !apt install poppler-utils
  #!apt install tesseract-ocr
  from pdf2image import convert_from_path
  import pytesseract

  pages = convert_from_path('k.pdf', dpi=300)

  all_text = ""
  for page_num, img in enumerate(pages, start=1):
      text = pytesseract.image_to_string(img)
      all_text += f"\n--- Page {page_num} ---\n{text}"

  print(all_text)
constantinum•11mo ago
give https://pg.llmwhisperer.unstract.com/ a try

Ask HN: We just had an actual UUID v4 collision...

358•mittermayr•1d ago•283 comments

Rumors of my death are slightly exaggerated

1586•CliffStoll•2d ago•243 comments

Reflections on NetBSD 11

6•morpheos137•18h ago•2 comments

Can NPM, pnpm etc. use frontier models to check packages for malware?

4•VikRubenfeld•14h ago•4 comments

Novel macro signals for AI-related job loss?

4•sfmz•14h ago•1 comments

Ask HN: How do we handle the rise of low quality "This is LLM" comments?

8•shantnutiwari•18h ago•19 comments

Ask HN: What is your go-to solution for a personal wiki in 2026?

14•ex-aws-dude•1d ago•18 comments

Ask HN: What will happen as AI costs increase?

13•MetaWhirledPeas•1d ago•19 comments

Ask HN: How are you handling QA being bottlenecked with more AI-generated PRs?

3•softneon•1d ago•4 comments

Ask HN: How do you find good personal blogs on Google nowadays?

4•xapet•11h ago•7 comments

Claude Flags Hantavirus Vaccine Questions as Security Risk

11•pell•23h ago•9 comments

Ask HN: Are we gonna back less powerful local LLMs

9•omertt27•1d ago•8 comments

Ask HN: What do you still do manually in 2026 that should be automated?

17•lishunsheng•2d ago•29 comments

"Surface" a Governed AI-Agentic Surface

3•paulbernard•1d ago•0 comments

Ask HN: How to start up as an individual developer?

13•alexyan0431•2d ago•10 comments

Ask HN: Who got hired with Who wants to be hired? (On 2026)

18•Gooblebrai•3d ago•11 comments

Ask HN: Is the Job Market Actually Bad?

134•idontwantthis•6d ago•206 comments

You've reached the end!