frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Testing Super Mario Using a Behavior Model Autonomously

https://testflows.com/blog/testing-super-mario-using-a-behavior-model-autonomously-part1/
22•Naulius•2h ago

Comments

Naulius•2h ago
We built an autonomous testing example that plays Super Mario Bros. to explore how behavior models combine with autonomous testing. Instead of manually writing test cases, it systematically explores the game's massive state space while a behavior model validates correctness in real-time- write your validation once, use it with any testing driver. A fun way to learn how it all works and find bugs along the way. All code is open source: https://github.com/testflows/Examples/tree/v2.0/SuperMario.
janalsncm•1h ago
There’s a ton of crossover between your method and RL. I guess instead of directly training on episodes and updating model weights, you just store episodes in RAM and sample from the most promising ones. It could be a neat way of getting out of infamous RL cold start by getting some examples of rewards. Thanks for sharing.
Naulius•1h ago
Thanks! You're right that there's a resemblance to RL. The original approach was proposed by Antithesis, and in Part 1 we map it more directly to a mutation-based Genetic Algorithm: stored paths are the population, the x-position scoring is the fitness function, and bit-flip input generation is the mutation operator. There's no recombination and no learned policy but just evolutionary selection pressure on input sequences.

Interesting point about the RL cold start, one could definitely use the paths discovered first through the evolutionary exploration to seed an RL agent's initial experience which could help skip the early random flailing phase.

The key difference from RL is the goal. We're not trying to learn an optimal policy for playing the game and instead we're trying to explore as much of the state space as possible to find bugs. In Part 2 we plug in a behavior model that validates correctness at every frame during exploration (velocity constraints, causal movement checks, collision invariants). The combination is where it gets interesting: autonomous exploration discovers the states, and the behavior model catches when the game violates its own rules. For testing, the main reason we even care about completing each level is that a completed path serves as the base for more extensive exploration at every point along it. If the exploration can't reach the end, by definition we miss a large part of the state space.

wa008•1h ago
AI is much more powerful than human in the closed fields, like game and defense. AlphaGo proved that at first.
Naulius•1h ago
Agree. However, the described technique isn't really AI, there's no neural network or training. It's GA-driven exploration for testing: mutate inputs, keep what gets you further down the state space, discard what doesn't. AlphaGo optimizes for winning; testing optimizes for coverage. That said, what does apply well to testing from the AI field is the exploration during the training phase, as well as the ability to beat the game, giving you paths to branch off from to explore the test space further.

Keep Android Open

https://f-droid.org/2026/02/20/twif.html
672•LorenDB•3h ago•264 comments

Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI

https://github.com/ggml-org/llama.cpp/discussions/19759
576•lairv•7h ago•136 comments

Wikipedia deprecates Archive.today, starts removing archive links

https://arstechnica.com/tech-policy/2026/02/wikipedia-bans-archive-today-after-site-executed-ddos...
130•nobody9999•2h ago•67 comments

Facebook is cooked

https://pilk.website/3/facebook-is-absolutely-cooked
380•npilk•3h ago•261 comments

I found a Vulnerability. They found a Lawyer

https://dixken.de/blog/i-found-a-vulnerability-they-found-a-lawyer
108•toomuchtodo•2h ago•67 comments

OpenScan

https://openscan.eu/pages/scan-gallery
9•joebig•43m ago•0 comments

Lil' Fun Langs

https://taylor.town/scrapscript-000
70•surprisetalk•3h ago•6 comments

Blue light filters don't work – controlling total luminance is a better bet

https://www.neuroai.science/p/blue-light-filters-dont-work
51•pminimax•3h ago•80 comments

Making frontier cybersecurity capabilities available to defenders

https://www.anthropic.com/news/claude-code-security
69•surprisetalk•3h ago•27 comments

Trump's global tariffs struck down by US Supreme Court

https://www.bbc.com/news/live/c0l9r67drg7t
1020•blackguardx•6h ago•824 comments

Show HN: Mines.fyi – all the mines in the US in a leaflet visualization

https://mines.fyi/
3•irasigman•9m ago•1 comments

Uncovering insiders and alpha on Polymarket with AI

https://twitter.com/peterjliu/status/2024901585806225723
8•somerandomness•3h ago•0 comments

The path to ubiquitous AI (17k tokens/sec)

https://taalas.com/the-path-to-ubiquitous-ai/
615•sidnarsipur•10h ago•356 comments

How to Review an AUR Package

https://bertptrs.nl/2026/01/30/how-to-review-an-aur-package.html
29•exploraz•3d ago•1 comments

Legion Health (YC) Is Hiring Cracked SWEs for Autonomous Mental Health

https://jobs.ashbyhq.com/legionhealth/ffdd2b52-eb21-489e-b124-3c0804231424
1•ympatel•4h ago

Show HN: A native macOS client for Hacker News, built with SwiftUI

https://github.com/IronsideXXVI/Hacker-News
136•IronsideXXVI•7h ago•102 comments

I found a useful Git one liner buried in leaked CIA developer docs

https://spencer.wtf/2026/02/20/cleaning-up-merged-git-branches-a-one-liner-from-the-cias-leaked-d...
537•spencerldixon•7h ago•199 comments

Escaping Misconfigured VSCode Extensions (2023)

https://blog.trailofbits.com/2023/02/21/vscode-extension-escape-vulnerability/
4•abelanger•52m ago•0 comments

Child's Play: Tech's new generation and the end of thinking

https://harpers.org/archive/2026/03/childs-play-sam-kriss-ai-startup-roy-lee/
279•ramimac•6h ago•177 comments

Untapped Way to Learn a Codebase: Build a Visualizer

https://jimmyhmiller.com/learn-codebase-visualizer
172•andreabergia•12h ago•28 comments

The Popper Principle

https://theamericanscholar.org/the-popper-principle/
49•lermontov•1d ago•26 comments

PayPal discloses data breach that exposed user info for 6 months

https://www.bleepingcomputer.com/news/security/paypal-discloses-data-breach-exposing-users-person...
227•el_duderino•8h ago•70 comments

Testing Super Mario Using a Behavior Model Autonomously

https://testflows.com/blog/testing-super-mario-using-a-behavior-model-autonomously-part1/
22•Naulius•2h ago•5 comments

Raspberry Pi Pico 2 at 873.5MHz with 3.05V Core Abuse

https://learn.pimoroni.com/article/overclocking-the-pico-2
121•Lwrless•12h ago•41 comments

Do you want to build a community where users search or hang? (2021)

https://www.mooreds.com/wordpress/archives/3486
11•mooreds•3d ago•4 comments

Infrastructure decisions I endorse or regret after 4 years at a startup (2024)

https://cep.dev/posts/every-infrastructure-decision-i-endorse-or-regret-after-4-years-running-inf...
463•Meetvelde•3d ago•208 comments

AI is not a coworker, it's an exoskeleton

https://www.kasava.dev/blog/ai-as-exoskeleton
471•benbeingbin•1d ago•493 comments

Consistency diffusion language models: Up to 14x faster, no quality loss

https://www.together.ai/blog/consistency-diffusion-language-models
195•zagwdt•17h ago•87 comments

The Rediscovery of 103 Hokusai Lost Sketches (2021)

https://japan-forward.com/eternal-hokusai-the-rediscovery-of-103-hokusai-lost-sketches/
58•debo_•4d ago•7 comments

Lessons learned from `oapi-codegen`'s time in the GitHub Secure Open Source Fund

https://www.jvt.me/posts/2026/02/17/oapi-codegen-github-secure/
15•zdw•2d ago•0 comments