frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Best on device LLM tooling for PDFs?

4•martinald•1y ago
I've got very used to using the "big" LLMs for analysing PDFs

Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.

The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.

Given this is so new I'm struggling to find any tools which make this easier.

Comments

raymond_goo•1y ago
Try something like this

  !pip install pytesseract pdf2image pillow
  !apt install poppler-utils
  #!apt install tesseract-ocr
  from pdf2image import convert_from_path
  import pytesseract

  pages = convert_from_path('k.pdf', dpi=300)

  all_text = ""
  for page_num, img in enumerate(pages, start=1):
      text = pytesseract.image_to_string(img)
      all_text += f"\n--- Page {page_num} ---\n{text}"

  print(all_text)
constantinum•1y ago
give https://pg.llmwhisperer.unstract.com/ a try

Ask HN: Who is hiring? (June 2026)

225•whoishiring•1d ago•326 comments

He Blew the Whistle on DOGE. Then His Brakes Were Cut

7•dtjb•20m ago•1 comments

Ask HN: Who wants to be hired? (June 2026)

135•whoishiring•1d ago•413 comments

AI Goal: Senior Software Engineer

2•oryocyph•3h ago•2 comments

Anyone seen a CC- serial prefix on legacy networking hardware?

63•Throwaway_sys•2d ago•29 comments

Please don't spam people looking for employment. It's just cruel

853•IliaLitviak•6h ago•246 comments

Ask HN: Why is AI use decried if it has been used without attribution?

4•bookofjoe•9h ago•5 comments

Ask HN: What are your digital end-of-life plans?

6•trogdor•4h ago•2 comments

I'm Done Using AI

18•nyxtom•15h ago•11 comments

Ask HN: What is your opinion on index rule changes to accommodate Mega-Cap IPOs?

7•figmert•10h ago•4 comments

Ask HN: What Is the State of App Development in 2026?

98•karakoram•3d ago•70 comments

Tell HN: In the old days, computers used to get constantly faster and cheaper

11•wewewedxfgdf•13h ago•10 comments

$100 to a Debian Developer who can get Fresh Editor into Trixie

27•jph•2d ago•13 comments

Recruiters, How do you vet resume in 2026?

14•CoffeeSky•2d ago•7 comments

The AI tool discovery problem

6•meenabhagvat•1d ago•6 comments

Architect MCP and TUI

5•tonycdr•1d ago•0 comments

Ask HN: What are your worst war stories bringing agentic applications into prod

11•yaoke259•2d ago•8 comments

Donating AI credits to open source projects

6•happyPersonR•2d ago•5 comments

Ask HN: Corporate Disconnect Between "Tokenmaxxing" and Token Optimization

5•mc-0•2d ago•8 comments

Ask HN: What Hacker News comments have you bookmarked?

7•chistev•11h ago•4 comments

Tell HN: Meta's AI support feature allows Instagram accounts to be stolen

42•parable•1d ago•11 comments

Ask HN: Any advice on how to learn good software architecture practices?

17•jimsojim•4d ago•13 comments

You've reached the end!