Show HN: Research-Backed Multi-Agent System for Autonomous Development

https://github.com/asklokesh/claudeskill-loki-mode

3•slogansand•4w ago

Hey HN, author here. Loki Mode orchestrates specialized AI agents to take a PRD to deployed product with zero human intervention. But what I'm most proud of is the research foundation - we implemented virtually every scientifically proven pattern from the 2025-2026 AI agent literature. From Anthropic:

Constitutional AI self-critique against principles Building Effective Agents evaluator-optimizer pattern Claude Code Best Practices explore-plan-code workflow Visible Extended Thinking (think, think hard, ultrathink levels) Effective Harnesses one-feature-at-a-time pattern

From DeepMind:

SIMA 2 self-improvement loops Gemini Robotics hierarchical reasoning (planner + executor) Scalable AI Safety debate-based verification

From OpenAI:

Agents SDK tracing, guardrails, tripwires Deep Research adaptive planning with backtracking AGENTS.md standardized instructions

From Academic Research:

CONSENSAGENT (ACL 2025): Blind review + Devil's Advocate when unanimous. 30% false positive reduction. GoalAct: Global planning → skill decomposition → local execution. 12%+ success rate improvement. A-Mem: Zettelkasten-style memory linking for episodic→semantic consolidation. Multi-Agent Reflexion: Structured debate (Implementer → Skeptic → Advocate → Synthesizer). Iter-VF: Verify answer only, not reasoning chain. Prevents context overflow.

From Industry:

NVIDIA ToolOrchestra: Three-reward signal (outcome/efficiency/preference), dynamic agent selection AWS Bedrock: Routing mode for simple tasks, supervisor mode for complex Boris Cherny's self-verification loop (2-3x quality improvement) Simon Willison's sub-agents for context isolation

From HN discussions:

"Zero companies without human in the loop" → confidence-based escalation Context curation beats automatic RAG Fresh contexts yield better results LLM-as-judge has shared blind spots → deterministic validation

The full acknowledgements with links to every paper/resource: https://github.com/asklokesh/claudeskill-loki-mode/blob/main... Run: claude --dangerously-skip-permissions then "Loki Mode with PRD at path/to/prd" Happy to discuss any of the research or architecture decisions.

Comments

slogansand•4w ago

[2.35.0] - 2026-01-08 Added - Anthropic Agent Harness Patterns & Claude Agent SDK Sources:

Effective Harnesses for Long-Running Agents - Anthropic Engineering Claude Agent SDK Overview - Anthropic Platform New Patterns:

One Feature at a Time (Rule #7 in Core Autonomy)

Work on exactly one feature per iteration Complete, commit, verify before moving to next Prevents over-commitment and ensures clean progress tracking E2E Browser Testing with Playwright MCP

Features NOT complete until verified via browser automation New Essential Pattern: Playwright MCP -> Automate browser -> Verify UI features visually Detailed verification flow added to SKILL.md Note: Playwright cannot detect browser-native alert modals Advanced Task Tool Parameters

run_in_background: Returns output_file path, output truncated to 30K chars resume: Continue interrupted agents with full context Use cases: Context limits, rate limits, multi-session work Fixed Release workflow: Use gh CLI instead of softprops action for atomic release creation

Hoot: Scheme on WebAssembly

What the longevity experts don't tell you

Monzo wrongly denied refunds to fraud and scam victims

They were drawn to Korea with dreams of K-pop stardom – but then let down

Show HN: AI-Powered Merchant Intelligence

Bash parallel tasks and error handling

Let's compile Quake like it's 1997

Reverse Engineering Medium.com's Editor: How Copy, Paste, and Images Work

Go 1.22, SQLite, and Next.js: The "Boring" Back End

Laibach the Whistleblowers [video]

Slop News - HN front page right now hallucinated as 100% AI SLOP

Economists vs. Technologists on AI

Life at the Edge

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust