frontpage.

Show HN: We replaced 5 ML models with 1 shared encoder on an $11/month VPS

https://fittohire.com

1•blemis•1h ago

We were running 5 fine-tuned MiniLM models for a resume-to-job matching pipeline — 455MB on an $11/month VPS, one per task. I consolidated them into a single shared encoder with 5 lightweight heads: 25MB total, same latency, zero API calls. The matching score went up, not down.

The 5 tasks:

1. Classify JD lines (requirement vs boilerplate) 2. Split requirements into required vs preferred 3. Disambiguate skill mentions (Python-the-language vs Python-the-ecosystem) 4. Textual entailment (does resume experience satisfy a requirement?) 5. Semantic embeddings for similarity search

All five share MiniLM-L6 (22M params). Before: 5 x ~91MB fine-tuned copies — essentially the same encoder with different weights, burning RAM for no reason on a 4 vCPU / 8GB box. The obvious idea: share the encoder. One copy in memory, five lightweight heads (~580KB each) routing its output to task-specific predictions.

Attempt 1: Frozen encoder, linear heads. Cache CLS embeddings, train heads on cached vectors. Fast. Every head lost 10-15% accuracy. Pretrained representations weren't tuned for our tasks.

Attempt 2: Multi-task fine-tuning, 4 objectives. Unfreeze, train all four heads simultaneously with alternating batches, differential LR (encoder 2e-5, heads 1e-3). Classification recovered to within 2% of standalone. But embedding quality collapsed — "Python programming" and "cooking recipes" hit 0.91 cosine similarity. Classification objectives pushed all representations into a small region where heads could classify, destroying the distance structure embeddings need.

Attempt 3: Add contrastive objective as 5th task. Cosine similarity loss on positive/negative pairs alongside the four classification objectives. Explicitly penalizes the collapse. Encoder now has two competing incentives: make CLS tokens classifiable AND keep similar texts close / dissimilar texts far apart.

Gotchas:

Task weighting. Smallest dataset (65K) was drowned by largest (1.6M). 3x weight fixed it.

Embedding objective needs mass. 270K contrastive pairs vs 2.1M classification wasn't enough. Scaled to 1M pairs at 2x loss weight.

Some heads need independent training. Required-vs-preferred head wouldn't converge multi-task. Diagnostic showed encoder representations separated the classes (0.80 within vs 0.02 between cosine). Encoder fine, head was the problem. Froze encoder, retrained just that head in 30 seconds. 99% accuracy.

Result: one 22.9MB INT8 encoder + five ~580KB heads = 25MB total. Also a DeBERTa encoder (68.5MB) with two heads for token-level tasks (NER + section segmentation). Total: 94MB for 7 models, $11/month VPS, zero API costs.

The kicker: matching score went UP (71 → 75) because the entailment head became more accurate. It identifies when experience satisfies a requirement that a keyword matcher would miss. Pipeline speed 19s → 8.7s. Consolidated for RAM, got better accuracy as a bonus.

One thing I'd do differently: start from sentence-transformers/all-MiniLM-L6-v2, not the cross-encoder checkpoint. Sentence-transformer's embedding space is already oriented toward similarity; cross-encoder's is oriented toward ranking. Better starting point for the contrastive objective.

Happy to answer questions about multi-task training, the embedding-collapse diagnostic, quantization, or production deployment.

Soft launch — product is live at the URL, free to try one report. Feedback welcome, especially from anyone who's been through ATS-backed applications. And if you want to argue with the architecture choices, even better.

The Internet Is Real Life

The Oil Shock Is About to Hit America [video][25mins]

NASA's Curiosity rover finds organic molecules on Mars

Return of the Saturday Night Special, Courtesy of the SEC

Request Tracking: Lessons from Card Payments and HTTP/2

GitHub has stopped accepting new Copilot individual subscriptions

A Century of Chaos in a Single Emoji

AppWatch – Track Itch.io, Steam, App Store and Google Play in One Dashboard

An LLM invented a feature by hijacking my tool schema

Cocaine pollution alters the movement and space use of Atlantic salmon

Zelensky says failure of US envoys to visit Kyiv is 'disrespectful'

Abusing PostHog's setup wizard to get free Claude access

Neurobiologists Hack Brain Circuits Tied to Placebo Pain Relief

AES 128 is just fine in a post-quantum world

The Kuleshov Effect

The Forgotten History of Hershey's Electric Railway (1916) in Cuba

Design isn't dying. It's shifting left

Agentic memory with passive recall and citations as trust graph

Russia Is Building Tomorrow's War Machine

Artemis II Watches Earth Set Behind the Moon [video]

Curiosity rover finds signs of ancient life on Mars

Increased AI expectations without guidance leads to employee burnout

Show HN: Group Income – FOSS Privacy Respecting Basic Income System

I broke a working PR because an LLM convinced me there was a bug

Show HN: Detecting API degradation before thresholds are crossed

An Eng Lead's Guide to Tactical AI Adoption

Rockraft: Strongly-Consistent KV Storage Framework Based on OpenRaft and RocksDB

Show HN: CheckAgent The open-source pytest testing framework for AI agents

AMD Ryzen 9 9950X3D2 Dual Edition tested: Sweet, gratuitous overkill

Kubernetes Probes are Awesome (until they aren't)