Show HN: Xbow raised $117M to build AI hackers, I open-sourced it for free

86•ahmedallam2•2h ago

Comments

waihtis•2h ago

The joke is that Xbow only works because they have close to 100 employees operating the software

_pdp_•2h ago

You are joking, but there was actually a very popular enterprise SAST tool that used to offer a "cloud" version of their software. It worked by having someone from their team manually download the zip file of your code, run it through their desktop software, and then upload the results back to make them visible in the web portal.

ericmcer•1h ago

That's a totally valid and useful way to validate an idea. After a few months of manual labor they will have a good idea of how/what to build and if it is even worth building.

ai-christianson•1h ago

Classic thing that doesn't scale.

0cf8612b2e1e•1h ago

That seems like something that totally scales? Just requires some GUI automation (which can be quite finicky, so good to have a manual backup).

codys•1h ago

Unless the lack of real time (or consistent time to) results drives down interest in the cloud version, or instead of driving down interest makes it appear as if people want something different than what they would want if the time to results was consistent/faster.

Still could be worth doing a bit of manual work like this, but it's worth being cautious about drawing conclusions from it.

tptacek•1h ago

It is if you can keep a baseline level of quality uniform across both your customers and each of your customers projects. It's less OK if the human-assisted output is a loss-leader you burn on the pilot project, the first couple projects, or high-profile customers.

There's nothing fundamentally bad about having Oompa Loompa's behind the scenes, as long as you're honest about the outcomes you can provide.

I agree, though: also a very sensible way to prioritize development work.

Steeeve•1h ago

There's a reason Amazon's Mechanical Turk exists.

tptacek•1h ago

I know who you're talking about, but also: this is the joke about basically every hosted SAST and DAST tool. I call it the "Oompa Loompa" model of security products.

guhcampos•1h ago

"XBOW is an AI-powered penetration testing platform that delivers human-level security testing at machine speed."

At least they're not lying right? It's just people using computers.

armanj•1h ago

Took a while to notice it's xbow and not xbox

tptacek•1h ago

This is a neat project, I don't know why you'd want to set it up with this comparison to Xbow. As someone who works (worked? I'm non-ironically still trying to figure out if I belong in this space post-AI!) in this space and knows some of the actors, I'm pretty sure there's more to Xbow than ~1800 lines of prompts. Like: this is your RCE template prompt:

https://github.com/usestrix/strix/blob/main/strix/prompts/vu...

... and this is great, I'm not dunking, but pretty basic?

We just had the DARPA AIxCC results come in, and those systems are (1) open source and (2) presumably simpler/less polished than Xbow (some of the authors will be quick to tell you that they're doing PoC work, not product development), and (3) they're more complicated than this.

Again, to be super clear: I think there's a huge amount of potential in building something like this up. Nessus was much simpler than ISS when it first shipped, but you'd rather be Nessus than an ISS scanner developer! I'm just: why set this bar for your project?

Best of luck with this!

thegeomaster•1h ago

Seems heavily vibe coded, down to the Claude-generated README and a lot of the LLM prompts themselves (which I have found works very poorly compared to human-written prompts). While none of this is necessarily bad, it requires a higher burden of proof that it actually works beyond toy problems [0]. I think everyone would appreciate some examples of vulnerabilities it can find. The missing JWT check showcased in the screenshot would've probably been caught with ordinary AI code review, so to my eye that by itself is not persuasive.

Good luck!

[0]: Why I say this --- a 10kLOC piece of software that was mostly human-written would require a large amount of testing, even manual, to ensure that it works, reliably, at all. All this testing and experimentation would naturally force a certain depth of exploration for the approach, the LLM prompts, etc across a variety of usecases. A mostly AI-written codebase of this size would've required much less testing to get it to "doesn't crash and runs reliably", and so this depth is not a given anymore.

Obsidian Bases

Show HN: Fractional jobs – part-time roles for engineers

Lab-Grown Salmon Hits the Menu at an Oregon Restaurant as the FDA Greenlights

Shamelessness as a strategy (2019)

What could have been

A minimal tensor processing unit (TPU), inspired by Google's TPU

Show HN: Whispering – Open-source, local-first dictation you can trust

Show HN: We started building an AI dev tool but it turned into a Sims-style game

Left to Right Programming

Spice Data (YC S19) Is Hiring a Product Associate (New Grad)

The Rising Returns to R&D: Ideas Are Not Getting Harder to Find

Newsmax agrees to pay $67M in defamation case over bogus 2020 election claims

Counter-Strike: A billion-dollar game built in a dorm room

Show HN: I built an app to block Shorts and Reels

Anna's Archive: An Update from the Team

FFmpeg Assembly Language Lessons

An IRC-Enabled Lawn Mower

Show HN: I built a toy TPU that can do inference and training on the XOR problem

GenAI FOMO has spurred businesses to light nearly $40B on fire

Sikkim and the Himalayan Chess Game (2016)

T-Mobile claimed selling location data without consent is legal–judges disagree

Phrack 72

Structured (Synchronous) Concurrency

The Cutaway Illustrations of Fred Freeman (2016)

Launch HN: Reality Defender (YC W22) – API for Deepfake and GenAI Detection

Typechecker Zoo

Mindless Machines, Mindless Myths

The lottery ticket hypothesis: why neural networks work

The Weight of a Cell

How much do electric car batteries degrade?

Show HN: Xbow raised $117M to build AI hackers, I open-sourced it for free

Comments

Obsidian Bases

Show HN: Fractional jobs – part-time roles for engineers

Lab-Grown Salmon Hits the Menu at an Oregon Restaurant as the FDA Greenlights

Shamelessness as a strategy (2019)

What could have been

A minimal tensor processing unit (TPU), inspired by Google's TPU

Show HN: Whispering – Open-source, local-first dictation you can trust

Show HN: We started building an AI dev tool but it turned into a Sims-style game

Left to Right Programming

Spice Data (YC S19) Is Hiring a Product Associate (New Grad)

The Rising Returns to R&D: Ideas Are Not Getting Harder to Find

Newsmax agrees to pay $67M in defamation case over bogus 2020 election claims

Counter-Strike: A billion-dollar game built in a dorm room

Show HN: I built an app to block Shorts and Reels

Anna's Archive: An Update from the Team

FFmpeg Assembly Language Lessons

An IRC-Enabled Lawn Mower

Show HN: I built a toy TPU that can do inference and training on the XOR problem

GenAI FOMO has spurred businesses to light nearly $40B on fire

Sikkim and the Himalayan Chess Game (2016)

T-Mobile claimed selling location data without consent is legal–judges disagree

Phrack 72

Structured (Synchronous) Concurrency

The Cutaway Illustrations of Fred Freeman (2016)

Launch HN: Reality Defender (YC W22) – API for Deepfake and GenAI Detection

Typechecker Zoo

Mindless Machines, Mindless Myths

The lottery ticket hypothesis: why neural networks work

The Weight of a Cell

How much do electric car batteries degrade?