frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Best on device LLM tooling for PDFs?

4•martinald•7mo ago
I've got very used to using the "big" LLMs for analysing PDFs

Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.

The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.

Given this is so new I'm struggling to find any tools which make this easier.

Comments

raymond_goo•7mo ago
Try something like this

  !pip install pytesseract pdf2image pillow
  !apt install poppler-utils
  #!apt install tesseract-ocr
  from pdf2image import convert_from_path
  import pytesseract

  pages = convert_from_path('k.pdf', dpi=300)

  all_text = ""
  for page_num, img in enumerate(pages, start=1):
      text = pytesseract.image_to_string(img)
      all_text += f"\n--- Page {page_num} ---\n{text}"

  print(all_text)
constantinum•7mo ago
give https://pg.llmwhisperer.unstract.com/ a try

DNS4EU Blocks Blog.fefe.de

2•koehr•30m ago•3 comments

Ask HN: What developer tool do you wish existed in 2026?

3•allenleee•31m ago•1 comments

Ask HN: What's a book that fundamentally altered your mental models

3•brihati•1h ago•4 comments

Ask HN: What is still hard about system design with AI?

2•brihati•3h ago•2 comments

Tell HN: HN was down

595•uyzstvqs•3d ago•327 comments

Ask HN: Those making $500/month on side projects in 2025 – Show and tell

454•cvbox•3d ago•529 comments

Ask HN: What are your predictions for 2026?

102•mfrw•4d ago•189 comments

Ask HN: Who here is not working on web apps/server code?

83•ex-aws-dude•2d ago•94 comments

Ask HN: What public Claude Code MCPs, Skills do you have installed and use?

2•franze•6h ago•4 comments

Ask HN: Does anyone understand how Hacker News works?

163•jannesblobel•3d ago•226 comments

Ask HN: Is GitHub becoming more and more unstable?

6•pavish•1d ago•2 comments

Built a content system that 6x'd traffic. Turning it into product. Want to test?

4•zchmael•23h ago•2 comments

AI Code assistants has made completing side projects so easy

9•akmittal•1d ago•8 comments

Cloudflare has been broken for 15 hours

12•Canada•1d ago•12 comments

LLM Benchmark: Frontier models now statistically indistinguishable

4•js4ever•1d ago•4 comments

Ask HN: Is building a calm, non-gamified learning app a mistake?

87•hussein-khalil•6d ago•122 comments

How would you learn to code in 2026?

5•jeevships•12h ago•8 comments

FWS – pip-installable embedded process supervisor with PTY/pipe/dtach back ends

5•mrsurge•3d ago•0 comments

The offline geocoder we wanted

7•gipsyjaeger•1d ago•2 comments

Ask HN: How do you deal with large, hard-to-read Excel formulas?

9•jack_ruru•2d ago•10 comments

Ask HN: How are you LLM-coding in an established code base?

70•adam_gyroscope•4d ago•66 comments

Ask HN: Should I start a software foundation (goal: help emergency services)?

13•strgcmc•3d ago•1 comments

Ask HN: Is Stack Overflow Dead?

12•raphar•2d ago•17 comments

Ask HN: What would you call a package whose purpose is to import data?

7•ctc24•2d ago•9 comments

Ask HN: Do you allow vibecoded submissions in your open-source projects?

3•sneas•2d ago•10 comments

Ask HN: If you had to get a non-tech masters degree, what would you go for?

4•highwayman47•3d ago•10 comments

Ask HN: If one day AI brain chips become a thing, would you get it?

6•keepamovin•2d ago•24 comments

Ask HN: Is RSS Still Alive?

11•militanz•3d ago•12 comments

Ask HN: Etiquette giving feedback on mostly AI-generated PRs from co-workers

5•chfritz•3d ago•6 comments

Ask HN: How do teams remember why infrastructure decisions were made?

7•curious_sre•2d ago•11 comments