Ask HN: Best on device LLM tooling for PDFs?

4•martinald•8mo ago

I've got very used to using the "big" LLMs for analysing PDFs

Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.

The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.

Given this is so new I'm struggling to find any tools which make this easier.

Comments

raymond_goo•8mo ago

Try something like this

  !pip install pytesseract pdf2image pillow
  !apt install poppler-utils
  #!apt install tesseract-ocr
  from pdf2image import convert_from_path
  import pytesseract

  pages = convert_from_path('k.pdf', dpi=300)

  all_text = ""
  for page_num, img in enumerate(pages, start=1):
      text = pytesseract.image_to_string(img)
      all_text += f"\n--- Page {page_num} ---\n{text}"

  print(all_text)

constantinum•8mo ago

give https://pg.llmwhisperer.unstract.com/ a try

Thoughts on LLMs

China's rare earth steel is transforming infrastructure [video]

Show HN: CodeMic

How to build a hero section that gets you a chance

Framework 13 Initial Impressions

Show HN: Peekr – An anonymous "Truth or Dare" game built with MERN

Casplist.eu

OpenAI exec becomes top Trump donor with $25M gift

(AI) Slop Terrifies Me

Anthropic's team cut ad creation time from 30 minutes to 30 seconds

Show HN: Elysia JIT "Compiler", why it's one of the fastest JavaScript framework

Cache Monet

Chinese Propaganda in Infomaniak's Euria, and a Reflection on Open Source AI

Show HN: A free, browser-only PDF tools collection built with Kimi k2.5

Curating a Show on My Ineffable Mother, Ursula K. Le Guin

Show HN: HackerStack.dev – 49 Curated AI Tools for Indie Hackers

Pensions Are a Ponzi Scheme

Divvy.club – Splitwise alternative that makes sense

Betterment data breach exposes 1.4M customers

MIT Technology Review has confirmed that posts on Moltbook were fake

Epstein Science: the people Epstein discussed scientific topics with

Bambuddy – a free, self-hosted management system for Bambu Lab printers

Every Failed M4 Gun Replacement Attempt

China ramps up energy boom flagged by Musk as key to AI race

Show HN: ClawBox – Dedicated OpenClaw Hardware (Jetson Orin Nano, 67 Tops, 20W)

Ask HN: AI never gets flustered, will that make us better as people or worse?

Show HN: HalalCodeCheck – Verify food ingredients offline

Student makes cosmic dust in a lab, shining a light on the origin of life

In the Australian outback, we're listening for nuclear tests

'Hermès orange' iPhone sparks Apple comeback in China