frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: Who is hiring? (December 2025)

185•whoishiring•6h ago•238 comments

Ask HN: Who wants to be hired? (December 2025)

80•whoishiring•6h ago•160 comments

Ask HN: Does cross-posting to Medium still help, or does it just dilute SEO now?

2•StealthyStart•1h ago•1 comments

Ask HN: Looking for "invisible" OSS projects to donate to for Cybermonday

2•Paradigm2020•3h ago•0 comments

How do you handle lost webhooks in production?

14•everydaydev•9h ago•10 comments

Regarding Thien-Thi Nguyen

346•SmolCloud•20h ago•6 comments

My 2016 iPhone SE got an update after 9 Years

2•vigneshesan•5h ago•2 comments

Tell HN: It's now impossible to disable all AI features in Firefox 145 (latest)

66•pera•1d ago•23 comments

Ask HN: Linux offline knowledge base app?

2•rando77•9h ago•4 comments

Tell HN: Regrets. Think carefully about how you spend your time

229•anonymous_ibex•1d ago•117 comments

Ask HN: Which course you took ultimately had the biggest impact on your career?

5•optbuild•4h ago•1 comments

Ask HN: Anyone using CRM and chatbot? What's broken or frustrating?

3•adipm_tech•13h ago•1 comments

Tell HN: Happy Thanksgiving

801•prodigycorp•4d ago•197 comments

CPU-only PPO solving TSPLIB lin318 in 20 mins (0.08% gap)

5•jivaprime•22h ago•0 comments

Python terminal app as Android Phone app

5•dharmatech•18h ago•8 comments

Tell HN: Want a better HN? Visit /newest

293•alecco•3d ago•85 comments

I changed my address, and TransferWise in two days will empty my account

37•casenmgreen•2d ago•29 comments

Tell HN: I'm posting this while in flight over Atlantic Ocean

17•novateg•1d ago•8 comments

Ask HN: Which cloud provider do you like best and why?

15•trio8453•3d ago•19 comments

Ask HN: How do you verify front-end code in agentic LLM coding loops?

7•eugene-kim•3d ago•2 comments

Color.io Is Going Offline

25•hilti•5d ago•16 comments

Ask HN: What open source projects are you grateful for?

27•jayzalowitz•4d ago•35 comments

Ask HN: Should account creation/origin country be displayed on HN profiles?

26•megraf•6d ago•37 comments

Why is OpenAI lying about the data its collecting on users?

19•kypro•4d ago•14 comments

GhidrAssist and GhidrAssistMCP LLM plugins reached v1.0

3•jtang613•1d ago•0 comments

TermoSlack – A Terminal Based Slack Client

7•adhyys•3d ago•1 comments

A 27M parameter model beating LLMs on reasoning tasks

6•SteadySurfdom•3d ago•1 comments

Optimze It for My GPU

2•rncode•2d ago•0 comments

Ask HN: Practicality of harnessing geomagnetic fields for electrical generation?

4•keepamovin•2d ago•7 comments

Ask HN: Dunkelflaute' turns off my monitor

4•bertili•2d ago•1 comments
Open in hackernews

CPU-only PPO solving TSPLIB lin318 in 20 mins (0.08% gap)

5•jivaprime•22h ago
Hi all

I’ve put together a repo demonstrating how to train PPO directly on a single TSPLIB instance (lin318) from scratch—without pre-training or GPUs.

Repo:https://github.com/jivaprime/TSP

1. Experiment Setup

Problem: TSPLIB lin318 (Opt: 42,029) & rd400

Hardware: Google Colab (CPU only)

Model: Single-instance PPO policy + Value network. Starts from random initialization.

Local Search: Light 2-opt during training, Numba-accelerated 3-opt for evaluation.

Core Concept: Instead of a "stable average-error minimizer," this policy is designed as a high-variance explorer. The goal isn't to keep the average gap low, but to occasionally "spike" very low-error tours that local search can polish.

2. Results: lin318

Best Shot: 42,064 (Gap ≈ +0.08%)

Time: Reached within ~20 minutes on Colab CPU.

According to the logs (included in the repo), the sub-0.1% shot appeared around elapsed=0:19:49. While the average error oscillates around 3–4%, the policy successfully locates a deep basin that 3-opt can exploit.

3. Extended Experiment: Smart ILS & rd400

I extended the pipeline with "Smart ILS" (Iterated Local Search) post-processing to see if we could hit the exact optimum.

A. lin318 + ILS

Took the PPO-generated tour (0.08% gap) as a seed.

Ran Smart ILS for ~20 mins.

Result: Reached the exact optimal (42,029).

B. rd400 + ILS

PPO Phase: ~2 hours on CPU. Produced tours with ~1.9% gap.

ILS Phase: Used PPO tours as seeds. Ran for ~40 mins.

Result: Reached 0.079% gap (Cost 15,293 vs Opt 15,281).

Summary

The workflow separates concerns effectively:

PPO: Drives the search into a high-quality basin (1–2% gap).

ILS: Digs deep within that basin to find the optimum.

If you are interested in instance-wise RL, CPU-based optimization, or comparing against ML-TSP baselines (POMO, AM, NeuroLKH), feel free to check out the code.

Constructive feedback is welcome!