frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Run 35B LLMs on Dual Pascal GPUs with QLoRA

2•rickesh_tn•2h ago
Hi HN,

  I built a system to run 35B parameter language models on older Pascal GPUs (P100 +
  GTX 1080 Ti) using multi-GPU memory spillover.

  Problem: Most LLM inference tools (Ollama, LM Studio) are limited to single GPU VRAM
  (~13B models max on a 16GB GPU). If you have multiple older GPUs, the second one sits
   idle.

  Solution: Multi-GPU + CPU memory spillover with QLoRA 4-bit quantization. The system
  automatically distributes layers across GPU0 → GPU1 → CPU RAM, enabling 35B models on
   hardware that normally maxes at 13B.

  Benchmarks (P100 16GB + GTX 1080 Ti 11GB):
  - Qwen-14B: 13.7 tokens/sec (9.4GB VRAM)
  - OPT-30B: 5.4 tokens/sec (15.2GB VRAM)
  - CodeLlama-34B: 0.8 tokens/sec (16.7GB VRAM)

  Quick start:
    docker pull rickeshtn/large-model-international_release:latest
    docker run -it --rm --runtime=nvidia --gpus all --ipc=host     --ulimit memlock=-1
  --ulimit stack=268435456     -v $(pwd):/workspace -e HF_HOME=/workspace/model_cache
     rickeshtn/large-model-international_release:latest     python
  /app/interactive_chat.py --model-name Qwen/Qwen2.5-14B-Instruct

  Technical details:
  - QLoRA 4-bit NF4 quantization (75% memory reduction)
  - HuggingFace Transformers + Accelerate + bitsandbytes
  - Automatic device mapping with CPU offload
  - Interactive chat with conversation persistence

  GitHub: https://github.com/rickeshtn/locallm-pascal
  Docker Hub: https://hub.docker.com/r/rickeshtn/large-model-international_release

  34 users already running it. Happy to answer technical questions!

Young People Are Falling in Love with Old Technology

https://www.wsj.com/tech/personal-tech/flip-phone-digital-camera-28a118dd
1•1vuio0pswjnm7•6m ago•0 comments

Centerview Partners to face trial over junior banker's long hours

https://www.ft.com/content/550bc2e0-2869-4a5d-a677-a66f4c35a7b9
1•walterbell•8m ago•1 comments

pdoc.dev

https://pdoc.dev/
3•joshdavham•12m ago•0 comments

Company bids less than a penny per ton in biggest US coal sale in over a decade

https://apnews.com/article/trump-coal-sales-public-lands-montana-b2dbbdc81e7afbf24947b9a4b32fa417
1•c420•14m ago•0 comments

My First Contribution to Linux

https://vkoskiv.com/first-linux-patch/
1•panic•18m ago•0 comments

Alternate explanation of red shift of distant stars

1•naveen99•21m ago•1 comments

Show HN: I made a free tool that tells you the hairstyle that suit you the best

https://haircutai.app
1•pabloschz•21m ago•0 comments

GPT 1 Thinking

https://twitter.com/andrew_n_carr/status/1974625322609049803/photo/1
1•gmays•22m ago•0 comments

Swift SDK for Temporal by Apple

https://github.com/apple/swift-temporal-sdk
1•jen20•24m ago•1 comments

This Month in Redox – September 2025

https://www.redox-os.org/news/this-month-250930/
1•brson•33m ago•0 comments

The Mid-Atlantic Accent

https://literaryashland.org/?p=10803
1•J253•34m ago•0 comments

Study of 500K Medical Records Linked viral encephalitis with Alzheimer's

https://www.sciencealert.com/a-study-of-500000-medical-records-linked-viruses-with-alzheimers-aga...
3•Gaishan•37m ago•0 comments

What You Didn't Learn in Berkeley CS 188: Intro to RL

https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley
1•nsomani•44m ago•0 comments

Burbank Airport air traffic control tower unmanned on Monday evening

https://abc7.com/post/hollywood-burbank-airport-will-have-no-air-traffic-controllers-evening-faa-...
5•pRusya•45m ago•1 comments

Convergence

https://ethan.dev/convergence/
1•Beefin•58m ago•0 comments

Citadel's Griffin Calls Rush to Gold as Safer Asset 'Concerning'

https://www.bloomberg.com/news/articles/2025-10-06/citadel-s-griffin-calls-rush-to-gold-as-safer-...
4•clanky•1h ago•1 comments

Kssolv Toolbox: a visual, workflow-oriented tool for first-principles simulation

1•yliu7949•1h ago•0 comments

Synthetic Bootstrapped Pretraining

https://arxiv.org/abs/2509.15248
1•PaulHoule•1h ago•0 comments

Seeing Like a Software Company

https://www.seangoedecke.com/seeing-like-a-software-company/
1•Townley•1h ago•0 comments

AI-Powered Robots Install Solar Panels Faster Than Any Humans

https://cleantechnica.com/2025/10/06/ai-powered-robots-install-solar-panels-faster-than-any-humans/
2•toomuchtodo•1h ago•2 comments

A 12,000-year-old obelisk with a human face was found in Karahan Tepe

https://www.trthaber.com/foto-galeri/karahantepede-12-bin-yil-oncesine-ait-insan-yuzlu-dikili-tas...
5•fatihpense•1h ago•1 comments

From Matmul to Meaning

https://www.evis.dev/posts/why_matmul
1•ringstar•1h ago•0 comments

Show HN: Systems and algorithms for (machine-)learning Monopoly Deal

https://github.com/cavaunpeu/monopoly-deal-ai
1•willwolf•1h ago•1 comments

Stephen Hawking and the Rise of the AI Craze

https://x.com/search
1•wslh•1h ago•1 comments

A Solution to the Paperclip Problem

https://link.springer.com/article/10.1007/s42979-025-04369-4
2•academic_84572•1h ago•0 comments

Wildfires are now four times more frequent due to climate change

https://apnews.com/article/wildfires-lahaina-damage-death-climate-change-f6dd7bec2e0661ba45a052d6...
4•gmays•1h ago•0 comments

CWM: An Open-Weights LLM for Research on Code Generation with World Models

https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation...
1•metadat•1h ago•0 comments

Kyoto University's self-governed Yoshida Dorm

https://www.youtube.com/watch?v=RcZjvTbC8r0
1•h0rv•1h ago•0 comments

PG: Free Press got bought by them to control US media, not for revenue growth

https://twitter.com/paulg/status/1975199201463259471
13•donsupreme•1h ago•1 comments

Paper Mono

https://github.com/paper-design/paper-mono
3•whereistejas•1h ago•0 comments