Show HN: Novum – Automated ML Research Pipeline with Anti-Fabrication Guards

1•euanai•1h ago

Comments

euanai•1h ago

Hi HN! I'm the author.

Novum is a Claude Code extension that runs an autonomous ML research loop with mechanical guardrails designed to reduce result fabrication.

The key idea is that instead of relying on prompts like "don't hallucinate", the system enforces constraints mechanically (e.g., preventing edits to protected result files and enforcing phase gates in the research pipeline).

In a recent test run, a single /research command ran autonomously for about 30 hours: 10 hypotheses tested, 4 iteration cycles, and one champion solution selected.

Happy to answer questions or hear feedback on the guard design and research workflow.

isaackeitor•1h ago

Two things I'm curious about:

- How strict are the phase gates? Like, is it a hard checklist or can the system be more lenient depending on the task? - When picking the champion solution out of 10 hypotheses, what's actually being measured?

euanai•56m ago

Great questions!

Phase gates are hard — it's a PreToolUse hook (phase-gate-guard.js) that checks prerequisites before allowing state.json updates. If something's missing, the write gets denied. Like Phase 1→2 won't pass without literature-review.md (>2000 words), ≥10 papers in metadata, and a references.bib. Phase 6→7 needs a completed tournament with a champion. No exceptions — the agent just can't advance. There are some softer warnings too, but the main gates are hard blocks.

For champion selection — it's Successive Halving. All hypotheses compete in Round 1 (15% of GPU budget), top half survive to Round 2 (30%), champion gets Round 3 (55%). Each round eliminates the bottom half by score. The score is a weighted mix of metric improvement, mechanism signal quality, compute efficiency, and novelty — weights shift depending on venue target (oral cares more about novelty, poster cares more about raw metric gains).

Data Has Weight but Only on SSDs

With Neo, Apple Goes After Windows 11

Show HN: SpacePill – Better macOS Space Context Switching

Show HN: I built a prediction market that predicts itself

The Next Version of Curling IO

Fast IP and GPS to Location API (50ms, Global, 99.9% Uptime)

"Personal Data": more than a definition, a quasi-constitutional stake in EU

IMB Piracy and Armed Robbery Map 2025

New Emoji: Distorted Face

This job has become the ultimate case study why AI won't replace human workers

Learnings from a No-Code Lib: Keep the Spec Driven Development Triangle in Sync

Show HN: I made Claude Code block my distractions and track everything I ship

My MCP Server Setup: A Practical Guide to Wiring AI into Everything

Man Arrested for Plotting with Others to Murder or Kidnap Two Dissidents Abroad

Does Altman Deserve the Heat?

Harjus v4 adds kernel bypass and more

Show HN: TerminalNexus – Turn CLI commands into reusable buttons (Windows)

Why Autonomous Agents Failed the Initial Hype: An AutoGen Retrospective

Rob Grant Obituary on Ganymede and Titan

Agent-experience: visual reference to patterns, surfaces, and infrastructure

C++ Reflection: Another Monad

Invoicesio.app – Invoice and billing for freelancers and small businesses

AWS-hosted tech providers urge Middle East customers to fail over now

Dev stunned by $82K Gemini bill after unknown API key thief goes to town

Faster C software with Dynamic Feature Detection

Get Paid for Good Posts

Up to 10% of Firefox crashes are due to bad memory [thread]

With developer verification, Google's Apple envy threatens Android's open legacy

Ask HN: Does Claude Code's abilities fluctuate for you too?

CodeRabbit tops the F1 score in Martian's code review benchmarks