frontpage.

Hi all

I’ve put together a repo demonstrating how to train PPO directly on a single TSPLIB instance (lin318) from scratch—without pre-training or GPUs.

Repo:https://github.com/jivaprime/TSP

1. Experiment Setup

Problem: TSPLIB lin318 (Opt: 42,029) & rd400

Hardware: Google Colab (CPU only)

Model: Single-instance PPO policy + Value network. Starts from random initialization.

Local Search: Light 2-opt during training, Numba-accelerated 3-opt for evaluation.

Core Concept: Instead of a "stable average-error minimizer," this policy is designed as a high-variance explorer. The goal isn't to keep the average gap low, but to occasionally "spike" very low-error tours that local search can polish.

2. Results: lin318

Best Shot: 42,064 (Gap ≈ +0.08%)

Time: Reached within ~20 minutes on Colab CPU.

According to the logs (included in the repo), the sub-0.1% shot appeared around elapsed=0:19:49. While the average error oscillates around 3–4%, the policy successfully locates a deep basin that 3-opt can exploit.

3. Extended Experiment: Smart ILS & rd400

I extended the pipeline with "Smart ILS" (Iterated Local Search) post-processing to see if we could hit the exact optimum.

A. lin318 + ILS

Took the PPO-generated tour (0.08% gap) as a seed.

Ran Smart ILS for ~20 mins.

Result: Reached the exact optimal (42,029).

B. rd400 + ILS

PPO Phase: ~2 hours on CPU. Produced tours with ~1.9% gap.

ILS Phase: Used PPO tours as seeds. Ran for ~40 mins.

Result: Reached 0.079% gap (Cost 15,293 vs Opt 15,281).

Summary

The workflow separates concerns effectively:

PPO: Drives the search into a high-quality basin (1–2% gap).

ILS: Digs deep within that basin to find the optimum.

If you are interested in instance-wise RL, CPU-based optimization, or comparing against ML-TSP baselines (POMO, AM, NeuroLKH), feel free to check out the code.

Constructive feedback is welcome!

A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]