Launch HN: MindFort (YC X25) – AI agents for continuous pentesting

60•bveiseh•8mo ago

Hey HN! We're Brandon, Sam, and Akul from MindFort (https://mindfort.ai). We're building autonomous AI agents that continuously find, validate, and patch security vulnerabilities in web applications—essentially creating an AI red team that runs 24/7.

Here's a demo: https://www.loom.com/share/e56faa07d90b417db09bb4454dce8d5a

Security testing today is increasingly challenging. Traditional scanners generate 30-50% false positives, drowning engineering teams in noise. Manual penetration testing happens quarterly at best, costs tens of thousands per assessment, and takes weeks to complete. Meanwhile, teams are shipping code faster than ever with AI assistance, but security reviews have become an even bigger bottleneck.

All three of us encountered this problem from different angles. Brandon worked at ProjectDiscovery building the Nuclei scanner, then at NetSPI (one of the largest pen testing firms) building AI tools for testers. Sam was a senior engineer at Salesforce leading security for Tableau. He dealt firsthand with juggling security findings and managing remediations. Akul did his master's on AI and security, co-authored papers on using LLMs for ecurity attacks, and participated in red-teams at OpenAI and Anthropic.

We all realized that AI agents were going to fundamentally change security testing, and that the wave of AI-generated code would need an equally powerful solution to keep it secure.

We've built AI agents that perform reconnaissance, exploit vulnerabilities, and suggest patches—similar to how a human penetration tester works. The key difference from traditional scanners is that our agents validate exploits in runtime environments before reporting them, reducing false positives.

We use multiple foundational models orchestrated together. The agents perform recon to understand the attack surface, then use that context to inform testing strategies. When they find potential vulnerabilities, they spin up isolated environments to validate exploitation. If successful, they analyze the codebase to generate contextual patches.

What makes this different from existing tools? Validation through exploitation: We don't just pattern-match—we exploit vulnerabilities to prove they're real; - Codebase integration: The agents understand your code structure to find complex logic bugs and suggest appropriate fixes; - Continuous operation: Instead of point-in-time assessments, we're constantly testing as your code evolves; - Attack chain discovery: The agents can find multi-step vulnerabilities that require chaining different issues together.

We're currently in early access, working with initial partners to refine the platform. Our agents are already finding vulnerabilities that other tools miss and scoring well on penetration testing benchmarks.

Looking forward to your thoughts and comments!

Comments

blibble•8mo ago

what controls do you have to ensure consent from the target site?

bko•8mo ago

In the video demo they showed requiring a TXT in the DNS to confirm you have consent

blibble•8mo ago

so they'll point it a domain they control, then reverse proxy it onto their target?

icedchai•8mo ago

What do you propose they do instead?

blibble•8mo ago

not offer automated targeted hacking as a service?

even the booters market themselves as as "legitimate stress testing tools for enterprise"

Sohcahtoa82•8mo ago

> not offer automated targeted hacking as a service?

MindFort is not the first and won't be the last. There are plenty of DAST tools offered as a SaaS that are the same thing.

chatmasta•8mo ago

How about the would-be victims don’t ship exploitable software to production? If that’s not possible, then maybe they should signup for an automated targeted hacking service to find the exploitable bugs before someone else does.

Your argument is straight out of the 1990s. We’ve moved beyond this as an industry, as you can see from the proliferation of bug bounty programs, responsible disclosure policies, CVE transparency, etc…

Sohcahtoa82•8mo ago

And in the process, reveal their own IP address rather than MindFort's.

blibble•8mo ago

by theirs, you mean, the IP of a IoT device/router they've hacked

bveiseh•8mo ago

Yup as mentioned, we do the TXT verification of the domain. We also don't offer self service sign up, so we are able to screen customers ahead of time and regularly monitor for any bad behavior.

lazyninja987•8mo ago

Is it a pre-requisute for the agents to have access to the source code to generate attack strategies?

How about pen-testing a black box?

Does the potential vulnerabilities list is generated by matching list of vulnerabilities that are publicly disclosed for the framework version of target software stack constituents?

I am new to LLMs or any ML for that matter. Congrats on your launch.

bveiseh•8mo ago

Thanks so much.

Great question, it is not required but we recommend it. If you don't include the source code, it would be black box. The agents won't know what the app looks like from the other side.

The agents identify vulns using known attack patterns, novel techniques, and threat intelligence.

sumanyusharma•8mo ago

Congratulations on the launch. Few qs:

How do your agents decide a suspected issue is a validated vulnerability, and what measured false-positive/false-negative rates can you share?

How is customer code and data isolated and encrypted throughout reconnaissance, exploitation, and patch generation (e.g., single-tenant VPC, data-retention policy)?

Do the agents ever apply patches automatically, or is human review required—and how does the workflow integrate with CI/CD to prevent regressions?

Ty!

bveiseh•8mo ago

Appreciate it!

The agents will hone in on a potential vulnerability by looking at different signals during its testing, and then build a POC to validate it based on the context. We don't have any data to share publicly yet but we are working on releasing benchmarks soon.

Everything runs in a private VPC and data is encrypted in transit and at rest. We have zero data retention agreements with our vendors, and we do offer single tenant and private cloud deployments for customers. We don't retain any customer code once we finish processing it, only the vulnerability data. We are also in process of receiving our SOC 2.

Patches are not auto applied. We can either open up a PR for human review or can add the necessary changes to a Linear/Jira ticket. We have the ability schedule assessments in our platform, and are working on a way to integrate more deeply with CI/CD.

gyanchawdhary•8mo ago

Congratulations on the launch. How different is this from xbow.com, shinobi.security, gecko.security. zeropath.com etc ?

bveiseh•8mo ago

Thanks so much.

We want to solve the entire vulnerability lifecycle problem not just finding zero days. MindFort works from detection, validation, triage/scoring, all the way to patching the vulnerability. While we are starting with web app, we plan to expand to the rest of the attack surface soon.

handfuloflight•8mo ago

Any outlines on pricing?

bveiseh•8mo ago

It depends on the size of your attack surface, complexity of the application, and frequency of assessments, so for now we are working out custom agreements with each customer based on these factors.

robszumski•8mo ago

How does a customer use this?

Point it at a publicly available webapp? Run it locally against dev? Do I self-host it and continually run against staging as it's updated?

bveiseh•8mo ago

So you would point it to any web app available over the internet. There is an option to have a private deployment in your VPC to test applications that are not exposed to the internet. You can also schedule assessments so that the system runs at a regular interval (daily, weekly, bi-weekly, etc)

mparis•8mo ago

Congrats on the launch. Seems like a natural domain for an AI tool. One nice aspect about pen testing is it only needs to work once to be useful. In other words, it can fail most of the time and no one but your CFO cares. Nice!

A few questions:

On your site it says, "MindFort can asses 1 or 100,000 page web apps seamlessly. It can also scale dynamically as your applications grow."

Can you provide more color as to what that really means? If I were actually to ask you to asses 100,000 pages what would actually happen? Is it possible for my usage to block/brown-out another customer's usage?

I'm also curious what happens if the system does detect a vulnerability. Is there any chance the bot does something dangerous with e.g. it's newly discovered escalated privileges?

Thanks and good luck!

bveiseh•8mo ago

Thanks so much!

In regards to the scale, we absolutely can assess at that scale, but it would require quite a large enterprise contract upfront, as we would need to get the required capacity from our providers.

The system is designed to safely test exploitation, and not perform destructive testing. It will traverse as far as it can, but it won't break anything along the way.

HocusLocus•8mo ago

You're gonna poke your eye out with those pentesters...

Sohcahtoa82•8mo ago

One thing I've run into with DAST tools is that they're awful at handling modern web apps where JS code fetches data with an API and then updates the DOM accordingly. They act like web pages are still using server-side HTML rendering and throw XSS false positives because a JSON response will return "<script>alert(1)</script>" in the data, even when the data is then put in the web page using either element.innerText or uses a framework that automatically prevents XSS.

Alternatively, they don't properly handle session tokens that don't rely on cookies, such as bearer tokens. At the place I work, in our app, the session token is passed as parameter in the request payload. Not a cookie or the Authorization header!

How well does MindFort handle these scenarios?

Study confirms experience beats youthful enthusiasm

The Big Hunger by Walter J Miller, Jr. (1952)

The Genus Amanita

We have broken SHA-1 in practice

Ask HN: Was my first management job bad, or is this what management is like?

Ask HN: How to Reduce Time Spent Crimping?

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

Wellness Hotels Discovery Application

NASA delays moon rocket launch by a month after fuel leaks during test

Study confirms experience beats youthful enthusiasm

The Big Hunger by Walter J Miller, Jr. (1952)

The Genus Amanita

We have broken SHA-1 in practice

Ask HN: Was my first management job bad, or is this what management is like?

Ask HN: How to Reduce Time Spent Crimping?

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

Wellness Hotels Discovery Application

NASA delays moon rocket launch by a month after fuel leaks during test

Launch HN: MindFort (YC X25) – AI agents for continuous pentesting

Comments