frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Testing Super Mario Using a Behavior Model Autonomously

https://testflows.com/blog/testing-super-mario-using-a-behavior-model-autonomously-part1/
21•Naulius•1h ago

Comments

Naulius•1h ago
We built an autonomous testing example that plays Super Mario Bros. to explore how behavior models combine with autonomous testing. Instead of manually writing test cases, it systematically explores the game's massive state space while a behavior model validates correctness in real-time- write your validation once, use it with any testing driver. A fun way to learn how it all works and find bugs along the way. All code is open source: https://github.com/testflows/Examples/tree/v2.0/SuperMario.
janalsncm•1h ago
There’s a ton of crossover between your method and RL. I guess instead of directly training on episodes and updating model weights, you just store episodes in RAM and sample from the most promising ones. It could be a neat way of getting out of infamous RL cold start by getting some examples of rewards. Thanks for sharing.
Naulius•36m ago
Thanks! You're right that there's a resemblance to RL. The original approach was proposed by Antithesis, and in Part 1 we map it more directly to a mutation-based Genetic Algorithm: stored paths are the population, the x-position scoring is the fitness function, and bit-flip input generation is the mutation operator. There's no recombination and no learned policy but just evolutionary selection pressure on input sequences.

Interesting point about the RL cold start, one could definitely use the paths discovered first through the evolutionary exploration to seed an RL agent's initial experience which could help skip the early random flailing phase.

The key difference from RL is the goal. We're not trying to learn an optimal policy for playing the game and instead we're trying to explore as much of the state space as possible to find bugs. In Part 2 we plug in a behavior model that validates correctness at every frame during exploration (velocity constraints, causal movement checks, collision invariants). The combination is where it gets interesting: autonomous exploration discovers the states, and the behavior model catches when the game violates its own rules. For testing, the main reason we even care about completing each level is that a completed path serves as the base for more extensive exploration at every point along it. If the exploration can't reach the end, by definition we miss a large part of the state space.

wa008•58m ago
AI is much more powerful than human in the closed fields, like game and defense. AlphaGo proved that at first.
Naulius•27m ago
Agree. However, the described technique isn't really AI, there's no neural network or training. It's GA-driven exploration for testing: mutate inputs, keep what gets you further down the state space, discard what doesn't. AlphaGo optimizes for winning; testing optimizes for coverage. That said, what does apply well to testing from the AI field is the exploration during the training phase, as well as the ability to beat the game, giving you paths to branch off from to explore the test space further.

A floating power station? China's flying wind turbine hits milestone

https://www.euronews.com/next/2026/01/29/a-floating-power-station-chinas-flying-wind-turbine-hits...
1•geox•55s ago•0 comments

James Bond x Seedance 2.0 [video]

https://old.reddit.com/r/singularity/comments/1r9xgdz/james_bond_x_seedance_20/
1•mgh2•3m ago•0 comments

I Analyzed Every Nootropic Study on PubMed

https://outspeaker.com/post/217
1•onesandofgrain•3m ago•0 comments

Bay Area Apartment Hunting Has Turned into an AI Hellscape

https://www.sfgate.com/local/article/bay-area-apartment-ai-21332194.php
2•randycupertino•3m ago•1 comments

The Sixth Bureau [video]

https://www.bloomberg.com/features/the-sixth-bureau/
1•petethomas•3m ago•0 comments

OpenScan

https://openscan.eu/pages/scan-gallery
2•joebig•4m ago•0 comments

Phil Spencer Retiring, Sarah Bond Out, Asha Sharma Named New Xbox Boss

https://www.ign.com/articles/phil-spencer-retiring-sarah-bond-out-matt-booty-promoted-as-microsof...
3•CIARobotFish•5m ago•0 comments

Private Equity Debt Left a Leading VPN Open to Chinese Hackers

https://financialpost.com/pmn/business-pmn/how-private-equity-debt-left-a-leading-vpn-open-to-chi...
1•strict9•7m ago•0 comments

Task-Completion Time Horizons of Frontier AI Models (Includes Opus 4.6)

https://metr.org/time-horizons/
1•admp•8m ago•0 comments

Video Is 6 Minutes Long [video]

https://www.youtube.com/watch?v=NL-IVJi8r3M
1•financetechbro•8m ago•0 comments

What do you do after paste in sensi info i.e: API token & key into IDE AI chat?

1•lctan•9m ago•0 comments

Testing a proof-of-data transfer model for social media

https://www.galacticfederation.tech
1•subvertio•9m ago•1 comments

Performance Managing to the Room

https://jamesjboyer.substack.com/p/performance-managing-to-the-room
1•aesthetics1•9m ago•0 comments

AI automation to drive mass white-collar job losses in 12-18 months

https://economictimes.indiatimes.com/news/new-updates/ai-automation-to-drive-mass-white-collar-jo...
1•paulpauper•10m ago•0 comments

Mixing the new GitHub agent workflow and Jira MCP to accelerate PR review

https://nielsfreier.substack.com/p/i-let-an-ai-agent-read-my-jira-tickets
1•stumpyfr•10m ago•1 comments

Show HN: FeatureFlare – Feature flags for SaaS teams tired of rolling their own

https://featureflare.com/
1•jsonstcyr•10m ago•0 comments

The Worst-Case Future for White-Collar Workers

https://www.theatlantic.com/ideas/2026/02/ai-white-collar-jobs/686031/
1•paulpauper•10m ago•0 comments

Barcelona's Sagrada Familia reaches maximum height

https://news.sky.com/story/barcelonas-sagrada-familia-reaches-maximum-height-as-cross-is-placed-o...
1•austinallegro•10m ago•0 comments

BitFields API: Type-Safe Bit Packing for Lock-Free Data Structures

https://rocksdb.org/blog/2025/12/31/bit-fields-api.html
1•matt_d•12m ago•0 comments

The Backblaze Flamethrower Startup Program

https://www.backblaze.com/blog/introducing-the-backblaze-flamethrower-startup-program/
2•taubek•12m ago•0 comments

The Beautiful Simplicity of ColorForth (2013)

https://web.archive.org/web/20190715220632/https://blogs.msdn.microsoft.com/ashleyf/2013/11/02/th...
1•tosh•13m ago•0 comments

Escaping Misconfigured VSCode Extensions

https://blog.trailofbits.com/2023/02/21/vscode-extension-escape-vulnerability/
1•abelanger•13m ago•0 comments

Show HN: VillageSQL Extension Framework for MySQL

https://villagesql-blog.ghost.io/engineering-vef/
1•deesix•14m ago•0 comments

Apache Answer: Open-source Q&A forum software

https://answer.apache.org/
1•floren•14m ago•0 comments

Taalas Etches AI Models onto Transistors to Rocket Boost Inference

https://www.nextplatform.com/2026/02/19/taalas-etches-ai-models-onto-transistors-to-rocket-boost-...
1•wicket•15m ago•0 comments

Economics Is Fundamentally Political

https://www.theglobalcurrents.com/p/economics-is-fundamentally-political
1•a_victorp•15m ago•0 comments

Ieepa Ieepa Ieepa

https://paulkrugman.substack.com/p/ieepa-ieepa-ieepa
1•rbanffy•15m ago•0 comments

YouTube's First Video Acquired by London's V&A

https://news.artnet.com/art-world/youtubes-first-video-acquired-by-londons-va-2746518
2•bookofjoe•15m ago•0 comments

Zstdify: A Pure TypeScrpt ZSTD Re-Implementation, Written in 4 Hours

https://benhouston3d.com/blog/zstd-in-pure-javascript
1•bhouston•19m ago•0 comments

A chat room where LLM bots pretend to be human and everyone hunts each other

https://webecameshadows.com
1•ihmissuti•20m ago•1 comments