frontpage.

Hi all

I’ve put together a repo demonstrating how to train PPO directly on a single TSPLIB instance (lin318) from scratch—without pre-training or GPUs.

Repo:https://github.com/jivaprime/TSP

1. Experiment Setup

Problem: TSPLIB lin318 (Opt: 42,029) & rd400

Hardware: Google Colab (CPU only)

Model: Single-instance PPO policy + Value network. Starts from random initialization.

Local Search: Light 2-opt during training, Numba-accelerated 3-opt for evaluation.

Core Concept: Instead of a "stable average-error minimizer," this policy is designed as a high-variance explorer. The goal isn't to keep the average gap low, but to occasionally "spike" very low-error tours that local search can polish.

2. Results: lin318

Best Shot: 42,064 (Gap ≈ +0.08%)

Time: Reached within ~20 minutes on Colab CPU.

According to the logs (included in the repo), the sub-0.1% shot appeared around elapsed=0:19:49. While the average error oscillates around 3–4%, the policy successfully locates a deep basin that 3-opt can exploit.

3. Extended Experiment: Smart ILS & rd400

I extended the pipeline with "Smart ILS" (Iterated Local Search) post-processing to see if we could hit the exact optimum.

A. lin318 + ILS

Took the PPO-generated tour (0.08% gap) as a seed.

Ran Smart ILS for ~20 mins.

Result: Reached the exact optimal (42,029).

B. rd400 + ILS

PPO Phase: ~2 hours on CPU. Produced tours with ~1.9% gap.

ILS Phase: Used PPO tours as seeds. Ran for ~40 mins.

Result: Reached 0.079% gap (Cost 15,293 vs Opt 15,281).

Summary

The workflow separates concerns effectively:

PPO: Drives the search into a high-quality basin (1–2% gap).

ILS: Digs deep within that basin to find the optimum.

If you are interested in instance-wise RL, CPU-based optimization, or comparing against ML-TSP baselines (POMO, AM, NeuroLKH), feel free to check out the code.

Constructive feedback is welcome!

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: Who wants to be hired? (February 2026)

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

Tell HN: Another round of Zendesk email spam

AI Regex Scientist: A self-improving regex solver

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is there anyone here who still uses slide rules?

Kernighan on Programming

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is it just me or are most businesses insane?

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?