Show HN: Per-instance TSP Solver with No Pre-training (1.66% gap on d1291)

21•jivaprime•1mo ago

OP here.

Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance without any priors from other problems.

I built a solver using PPO that learns from scratch per instance. It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100.

The Core Idea: My hypothesis was that while optimal solutions are mostly composed of 'minimum edges' (nearest neighbors), the actual difficulty comes from a small number of 'exception edges' outside of that local scope.

Instead of pre-training, I designed an inductive bias based on the topological/geometric structure of these exception edges. The agent receives guides on which edges are likely promising based on micro/macro structures, and PPO fills in the gaps through trial and error.

It is interesting to see RL reach this level without a dataset. I have open-sourced the code and a Colab notebook for anyone who wants to verify the results or tinker with the 'exception edge' hypothesis.

Code & Colab: https://github.com/jivaprime/TSP_exception-edge

Happy to answer any questions about the geometric priors or the PPO implementation!

Comments

mkl•1mo ago

TSP = Travelling Salesman Problem (https://en.wikipedia.org/wiki/Travelling_salesman_problem)

PPO = Proximal Policy Optimisation, a reinforcement learning algorithm (https://en.wikipedia.org/wiki/Proximal_Policy_Optimization)

n8henrie•1mo ago

Thanks. Was wondering if this was about my federal thrift savings plan.

whatever1•1mo ago

Sorry if I am harsh, but a 1200 node tsp problem is a toy problem. We can find proven optimal solutions to these in a fraction of the time you spent.

RL is probably best suited for uncertainty infected instances.

whatever1•1mo ago

Out of curiosity I solved it with the concorde solver in the Neos server.

In 58s its heuristic found a solution 0.037% away from optimal, and in 943s it found and proved the optimal solution.

(This is with 3GB of ram and 4 threads of an Intel Xeon E5-2698 @ 2.3GHz aka a 30yo algorithm on a 10 yo machine)

miga•1mo ago

Also compare with LKH3 which seems much faster and closer to optimal.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: MCP App to play backgammon with your LLM

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: ARM64 Android Dev Kit

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Compile-Time Vibe Coding

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Daily-updated database of malicious browser extensions

Show HN: Horizons – OSS agent execution engine

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Sem – Semantic diffs and patches for Git

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: I built a directory of $1M+ in free credits for startups