frontpage.

Hi HN,

  I built a system to run 35B parameter language models on older Pascal GPUs (P100 +
  GTX 1080 Ti) using multi-GPU memory spillover.

  Problem: Most LLM inference tools (Ollama, LM Studio) are limited to single GPU VRAM
  (~13B models max on a 16GB GPU). If you have multiple older GPUs, the second one sits
   idle.

  Solution: Multi-GPU + CPU memory spillover with QLoRA 4-bit quantization. The system
  automatically distributes layers across GPU0 → GPU1 → CPU RAM, enabling 35B models on
   hardware that normally maxes at 13B.

  Benchmarks (P100 16GB + GTX 1080 Ti 11GB):
  - Qwen-14B: 13.7 tokens/sec (9.4GB VRAM)
  - OPT-30B: 5.4 tokens/sec (15.2GB VRAM)
  - CodeLlama-34B: 0.8 tokens/sec (16.7GB VRAM)

  Quick start:
    docker pull rickeshtn/large-model-international_release:latest
    docker run -it --rm --runtime=nvidia --gpus all --ipc=host     --ulimit memlock=-1
  --ulimit stack=268435456     -v $(pwd):/workspace -e HF_HOME=/workspace/model_cache
     rickeshtn/large-model-international_release:latest     python
  /app/interactive_chat.py --model-name Qwen/Qwen2.5-14B-Instruct

  Technical details:
  - QLoRA 4-bit NF4 quantization (75% memory reduction)
  - HuggingFace Transformers + Accelerate + bitsandbytes
  - Automatic device mapping with CPU offload
  - Interactive chat with conversation persistence

  GitHub: https://github.com/rickeshtn/locallm-pascal
  Docker Hub: https://hub.docker.com/r/rickeshtn/large-model-international_release

  34 users already running it. Happy to answer technical questions!

Young People Are Falling in Love with Old Technology

Centerview Partners to face trial over junior banker's long hours

pdoc.dev

Company bids less than a penny per ton in biggest US coal sale in over a decade

My First Contribution to Linux

Alternate explanation of red shift of distant stars

Show HN: I made a free tool that tells you the hairstyle that suit you the best

GPT 1 Thinking

Swift SDK for Temporal by Apple

This Month in Redox – September 2025

The Mid-Atlantic Accent

Study of 500K Medical Records Linked viral encephalitis with Alzheimer's

What You Didn't Learn in Berkeley CS 188: Intro to RL

Burbank Airport air traffic control tower unmanned on Monday evening

Convergence

Citadel's Griffin Calls Rush to Gold as Safer Asset 'Concerning'

Kssolv Toolbox: a visual, workflow-oriented tool for first-principles simulation

Synthetic Bootstrapped Pretraining

Seeing Like a Software Company

AI-Powered Robots Install Solar Panels Faster Than Any Humans

A 12,000-year-old obelisk with a human face was found in Karahan Tepe

From Matmul to Meaning

Show HN: Systems and algorithms for (machine-)learning Monopoly Deal

Stephen Hawking and the Rise of the AI Craze

A Solution to the Paperclip Problem

Wildfires are now four times more frequent due to climate change

CWM: An Open-Weights LLM for Research on Code Generation with World Models

Kyoto University's self-governed Yoshida Dorm

PG: Free Press got bought by them to control US media, not for revenue growth

Paper Mono

Run 35B LLMs on Dual Pascal GPUs with QLoRA

Young People Are Falling in Love with Old Technology

Centerview Partners to face trial over junior banker's long hours

pdoc.dev

Company bids less than a penny per ton in biggest US coal sale in over a decade

My First Contribution to Linux

Alternate explanation of red shift of distant stars

Show HN: I made a free tool that tells you the hairstyle that suit you the best

GPT 1 Thinking

Swift SDK for Temporal by Apple

This Month in Redox – September 2025

The Mid-Atlantic Accent

Study of 500K Medical Records Linked viral encephalitis with Alzheimer's

What You Didn't Learn in Berkeley CS 188: Intro to RL

Burbank Airport air traffic control tower unmanned on Monday evening

Convergence

Citadel's Griffin Calls Rush to Gold as Safer Asset 'Concerning'

Kssolv Toolbox: a visual, workflow-oriented tool for first-principles simulation

Synthetic Bootstrapped Pretraining

Seeing Like a Software Company

AI-Powered Robots Install Solar Panels Faster Than Any Humans

A 12,000-year-old obelisk with a human face was found in Karahan Tepe

From Matmul to Meaning

Show HN: Systems and algorithms for (machine-)learning Monopoly Deal

Stephen Hawking and the Rise of the AI Craze

A Solution to the Paperclip Problem

Wildfires are now four times more frequent due to climate change

CWM: An Open-Weights LLM for Research on Code Generation with World Models

Kyoto University's self-governed Yoshida Dorm

PG: Free Press got bought by them to control US media, not for revenue growth

Paper Mono