frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Loki Mode hit 99.67% SWE-Bench – MAF built a SaaS overnight

https://github.com/asklokesh/claudeskill-loki-mode
2•slogansand•1d ago
Last month I shared Loki Mode here. Since then, benchmarks came back.

SWE-Bench: 99.67% (299/300 problems) HumanEval: 98.78% Pass@1 (162/164)

For context, most single-agent systems hit 30-50%. Best proprietary ones hover around 70-80%.

The difference is architecture. 37 specialized agent types across 6 swarms (engineering, ops, business, data, product, growth). Parallel 3-reviewer code review. Feedback loops that actually learn.

To stress test it, I pointed it at a blank folder and said "build a ServiceNow replacement." It ran for 19 hours and built FireLater - complete ticket management, workflows, CMDB, knowledge base, self-service portal. I wrote zero lines of code.

New in this version: - Kanban board to visualize agent actions in real-time - Perpetual improvement via self-healing feedback loops - Smarter swarm coordination

Still open source. MIT license. Still not selling anything.

Loki Mode: https://github.com/asklokesh/claudeskill-loki-mode FireLater (built by Loki Mode): https://github.com/asklokesh/FireLater

Happy to answer questions about the architecture or benchmarks.

Comments

slogansand•1d ago
Author here. Quick context on the benchmarks:

We used RARV (Retrieve, Analyze, Reason, Validate) pattern with multi-agent collaboration. Each problem gets worked by specialized agents, reviewed by 3 parallel reviewers (code, business logic, security), and only merged after consensus.

The 99.67% isn't cherry-picked. Full run against standard SWE-Bench dataset. Happy to share methodology if anyone wants to reproduce.

slogansand•1d ago
On the swarm architecture for those curious:

Engineering (8 types): frontend, backend, database, mobile, API, QA, perf, infra Operations (8 types): devops, SRE, security, monitoring, incident, release, cost, compliance Business (8 types): marketing, sales, finance, legal, support, HR, investor, partnerships Data (3 types): ML, data eng, analytics Product (3 types): PM, design, tech writer Growth (4 types): growth hacker, community, success, lifecycle Review (3 types): code, business, security

Agents don't step on each other. Frontend agent never thinks about database schemas. QA agent never writes deployment scripts. Domain isolation is key.

slogansand•1d ago
For the skeptics (fair): FireLater repo has full git history. You can see the commits. No human intervention in the implementation phase.

I reviewed outputs and approved deployments. But architecture decisions, code, tests, docs - all Loki Mode.

It's not perfect. Some rough edges. But it works and enterprises can self-host it today.

slogansand•1d ago
vs single-agent coding assistants: They tap out around 50% on SWE-Bench. No specialization. No parallel review. No self-healing.

vs other multi-agent frameworks: Most focus on chat or simple task delegation. Loki Mode runs full SDLC - from PRD to deployed product with monitoring and business ops.

vs hiring a team: Obviously humans are better for ambiguous problems. But for well-defined PRDs, this removes the "I'll get to it this weekend" bottleneck.

slogansand•1d ago
Last time someone raised concerns about web crawling for competitive research. Valid point.

New version has configurable research modes. You can disable external crawling entirely and run fully offline if needed. Feedback heard.

Web dependencies are broken. Can we fix them?

https://lea.verou.me/blog/2026/web-deps/
1•ulrischa•32s ago•0 comments

AI 2.0

https://kennethwolters.com/posts/ai2/
1•kennethwolters•1m ago•1 comments

AI should be Free Software

https://substack.com/inbox/post/183934559
2•thejash•1m ago•0 comments

Former GLP-1 users regain lost weight after about 18 months, study says

https://www.washingtonpost.com/wellness/2026/01/08/ozempic-wegovy-weight-regain-glp1/
2•paulpauper•1m ago•0 comments

Two-way electric vehicle charging could stop renewable energy being wasted

https://theconversation.com/two-way-electric-vehicle-charging-at-scale-could-stop-renewable-energ...
2•PaulHoule•4m ago•0 comments

Internet access cut out in Iran after protests

https://apnews.com/article/iran-protests-us-israel-war-nuclear-economy-ebddd998fbe7903e70ca621272...
2•kwar13•4m ago•0 comments

Five Letter Word Finder Tool for Wordle Game

https://5letterlexicon.com
1•TheMashaBrand•6m ago•0 comments

Dogs eavesdrop on their owners to learn new words

https://arstechnica.com/science/2026/01/these-dogs-eavesdrop-on-their-owners-to-learn-new-words/
1•c420•6m ago•0 comments

Copyright Takedown Notices Don't Require Services to Find Other Identical Copies

https://blog.ericgoldman.org/archives/2026/01/copyright-takedown-notices-dont-require-services-to...
1•hn_acker•7m ago•1 comments

OpenAI Musk lawsuit over OpenAI for-profit conversion can go to trial

https://www.theguardian.com/technology/2026/jan/08/elon-musk-openai-lawsuit-for-profit-conversion...
3•mitchbob•7m ago•0 comments

Widely used pesticide (chlorpyrifos) linked to more than doubled Parkinsons risk

https://medicalxpress.com/news/2026-01-widely-pesticide-linked-parkinson.html
1•bikenaga•7m ago•0 comments

Ask HN: QR code generator that doesn't require sign up

1•aosaigh•9m ago•0 comments

Xthings Is Making a Narc Pole

https://gizmodo.com/xthings-is-making-a-narc-pole-2000705769
1•_____k•9m ago•0 comments

AI programs used by Heber City [Utah] police claim officer turned into a frog

https://www.fox13now.com/news/local-news/summit-county/how-utah-police-departments-are-using-ai-t...
2•achristmascarl•10m ago•0 comments

German government plans PRISM-like internet collection

https://www.sueddeutsche.de/politik/innere-sicherheit-hacking-bnd-geheimdienst-bnd-gesetz-vorrats...
3•chaoskanzlerin•11m ago•1 comments

Show HN: Analytical IK for 6-axis Cobots built with .NET 9 WASM AOT and Three.js

https://fanuc-kinematics.underautomation.com/
1•rufus31415•12m ago•0 comments

Best way to find chill job where I can learn and grow as a swe

1•digitdiglet•13m ago•0 comments

Star Tribune identifies ICE agent who fatally shot woman in Minneapolis

https://www.startribune.com/ice-agent-who-fatally-shot-woman-in-minneapolis-is-identified/601560214
9•phillipcarter•13m ago•2 comments

Mcpd Plugins: Extend Your Agent Infrastructure Without Touching Your Code

https://blog.mozilla.ai/mcpd-plugins-extend-your-agent-infrastructure-without-touching-your-code/
1•mzlaai•18m ago•0 comments

Testing whether AI-generated content will resonate before publishing

https://www.google.com/search?q=site%3Avect.pro&oq=&gs_lcrp=EgZjaHJvbWUqCQgAECMYJxjqAjIJCAAQIxgnG...
1•afrazullal•18m ago•1 comments

Atlas77 – A Wannabe System Programming Language

https://github.com/atlas77-lang/atlas77
1•Gipson62•18m ago•1 comments

Worst of Breed – Engineering Antipatterns

https://worstofbreed.net/
2•jrave•21m ago•0 comments

Show HN: I built an AI tool to analyze real estate investment potential

https://propertyprofitscanner.com/
2•todaycompanies•21m ago•0 comments

Show HN: I Built a Tool to Visualize "Bus Factor" and Knowledge Silos in GitHub

1•Warlax•22m ago•0 comments

Google AI Studio is now sponsoring Tailwind CSS

https://twitter.com/OfficialLoganK/status/2009339263251566902
16•qwertyforce•23m ago•0 comments

Keeping 10k GPUs Healthy

https://modal.com/blog/gpu-health
1•birdculture•23m ago•0 comments

AudioQ: Multi-channel audio queue management for browsers

https://github.com/tonycarpenter21/audioq
1•redmattred•23m ago•0 comments

Apple Loses Safari Lead Designer to the Browser Company

https://www.macrumors.com/2026/01/08/apple-loses-safari-designer-to-the-browser-company/
4•tosh•23m ago•0 comments

Claude keeps nagging about "Help improve Claude" inspite of previous decline

8•onesandofgrain•25m ago•6 comments

Built "Lisa" plugin for Claude Code – high IQ planner to pair with Ralph loops

https://github.com/blencorp/lisa
1•dotmike•26m ago•0 comments