We stopped AI bot spam in our GitHub repo using Git's –author flag

https://archestra.ai/blog/only-responsible-ai

100•ildari•1h ago

Comments

ildari•1h ago

Hi HN community, I wanted to share our approach to reduce amount of AI slop PR's and issues in our repo. We enabled "require prior contribution" flag on GH and created a CI script that creates a tiny commit co-authored with you, if you pass captcha on our website. Worked really well and we were able to block at least 500 bots in the first week. Sharing a screenshot from cloudflare: https://archestra.ai/hn-comment-cloudflare-challenge-outcome...

satvikpendem•49m ago

Yep, this is similar to some other version control tools like Tangled which has vouching.

https://blog.tangled.org/vouching/

tln•37m ago

Thats a really elegant solution.

How does the website trigger the CI script? Through GH rest API?

ildari•30m ago

thank you, yep through the rest API, here is the example: https://github.com/archestra-ai/website/blob/29ebdacbd8a22b9...

silverwind•34m ago

PR spam is a major problems for repo that run bounties. Maybe GitHub should temporarily block accounts from raising PRs if like 95%+ of them are getting rejected.

marginalx•28m ago

Problem is the bots can create any number of github accounts and continue spamming. Though this would be a good simple defense to start with.

hiccuphippo•25m ago

GitHub has not incentive for blocking AI. It's like asking an ad company to build an adblocker into their browser.

cdrnsf•19m ago

GitHub and Microsoft are actively contributing to the problem, why would they admit fault?

microtonal•9m ago

I feel like GitHub should have a system where you can give out tokens that are valid for e.g. 1 PR. If someone shows to engage in meaningful discussion and has a good idea to address an issue/feature, you initially give them one PR token. If the PR is of good quality, you can give them a few more, until they are contributors that can just create PRs as they like.

A similar system would be nice for issues, though I'm not sure what it'd look like if issues are the springboard for contributing PRs.

Not likely to ever happen (as others said), GitHub/MS want to sell CoPilot subscriptions/tokens and LLM-generated PRs are a part of that business model.

zer0tonin•33m ago

> Should we stop giving fun test tasks to our job candidates?

Yes

Chaosvex•13m ago

Yeah, fun for who exactly?

hiccuphippo•28m ago

The irony of the .ai domain.

wafflemaker•8m ago

Thanks for pointing it out. It has eluded me and it's incredibly funny

delduca•26m ago

For now…

philipwhiuk•15m ago

Until the AI learns the workflow on the next model update, indeed.

ramon156•25m ago

See, this is an article that uses dashes correctly. It adds value, creates a bit of buildup

chrismorgan•17m ago

This is funny to me because the title on this submission currently refers to “Git's –author flag”, which is an extremely incorrect use of a dash. (The original article doesn’t make the mistake. Not sure if the error is from the submitter or from an HN title mangulation.)

arecsu•25m ago

Makes me wonder if an ELO-based system would work to mitigate these issues. People who merged PR successfully onto a project, that had real issues acknowledged, the quality of their responses measured by other users reactions or something, etc, multiplied possibly by the degree of importance of the project where their activity has been made. Won't be about human vs AI, but actual helpful effective being vs low effort/spammy contributions. Issues and PRs could be sorted and filtered by their ELO score. I'm saying ELO as analogy to "score based given the context", not really a 1:1 translation of the ELO system.

Negative score would be reports from other users because of spammy content or not acknowledged issues, with a middle ground of neutral score (+-0) or little positive score to issues or whatever with clear good intention, but couldn't reach a proper merged PR or were not issues (e.g. issue existed but wasn't the correct repo to be addressed, PR was good but needed other stuff to be implemented prior to it, maybe in the long run, etc)

philipwhiuk•14m ago

The problem is you want the ELO score based on work on other community projects - you can't assume good faith here.

btilly•7m ago

The problem with that is that there are certain kinds of users that like to take control of community projects. And then they take control of more, and bigger ones.

There are a lot of political tricks that get used.

What is scary is that one of those kinds of users are malicious state actors. Like North Korea and Russia...

doh•10m ago

I have built something like this and in process of collecting the data.

Frontier users: 527,865 Light indexed: 527,865 Ready to queue: 9,083 Fast scores ready: 0 Activity events 24h: 30,266 Fast scores completed 24h: 19,123 Deep jobs completed 24h: 3,043 Fast-score ETA: n/a Deep-hydrate ETA: 69h Stale running jobs: 0 GitHub backpressure jobs: 19,113 High automation signals: 4,608 Medium automation signals: 1,327 Completed jobs: 74,714

Biggest challenge is Github's rate limits. At this pace it will take two more months to have 98% coverage. But after that the maintenance should be quite straight forward.

btilly•9m ago

ELO is shockingly easy to manipulate. For example there was a literal jail with a decent chess player in it. He created a pool of players who got great ELOs by beating him, then used them to boost his rating higher. Wash, rinse, and repeat.

Given any manipulatable scheme, AI will figure out how to manipulate it. For the OP, what happens if a single AI manages to get through to contributor? Then it starts elevating other AIs to contributor, and we're off again. There doesn't have to be a purpose to this. Trolls will troll, and trolls armed with AI bots can devote endless energy to doing so. The more you work to keep them out, the more fun it becomes for them.

I wish I had an answer for that problem. But I don't.

ElijahLynn•7m ago

For those wondering what Elo means, it is a person's last name, not an acronym (not all caps). More info here:

https://en.wikipedia.org/wiki/Elo_rating_system

petterroea•21m ago

What I see is a (clever) hack, and GitHub continuing to provide good tools to its users.

skydhash•3m ago

What I see is a solution for a problem that is self inflicted, meaning lumping contributors and generic internet users in the same workflow. In big projects, you have the core team, a handful of well known contributors, and everyone else.

I strongly prefer the git email model, where it’s often trivial to control the flow of changes proposal. GitHub does not have the same wealth of tools and versatility.

captn3m0•19m ago

This has a security implication which is overlooked. Contributors to a repository have higher rights, such as avoiding approval requirements for fork PR runs. GitHub warns in the docs:

> When requiring approvals only for first-time contributors (the first two settings), a user that has had any commit or pull request merged into the repository will not require approval. A malicious user could meet this requirement by getting a simple typo or other innocuous change accepted by a maintainer, either as part of a pull request they have authored or as part of another user's pull request.

ildari•13m ago

fair point! We believe "Require approval for all external contributors" should be a default setting, as you cannot trust anyone who is not a member of the organization

orlp•5m ago

No it doesn't have security implications.

If you are insecure because someone has had one of their otherwise completely innocent PRs merged into your repo... you are insecure, period.

_joel•13m ago

Woudln't it be trivial to farm the stats needed to pass the bot checker's theshold?

zzzeek•6m ago

so...they are manually re-setting the "interaction limits" over and over again, since they are only temporary?

why not use hooks to automatically reject issue comments / PRs etc. from users that didnt go through onboarding, rather than repurposing GH features that aren't really designed for that use (and are hence in danger of being changed someday)?

IshKebab•6m ago

That's a neat way to interface with GitHub's authentication system, but I don't see how they've solved the fundamental problem because their whitelisting process is just "click ok fine 10 times". Why won't the slop peddlers just do that too?

optionalsquid•6m ago

I don't have a better solution, unfortunately, but it doesn't seem seem to like the spam problem has been solved. It has just been moved from pull requests to commits:

Currently, more than 10% of all commits in the archestra repo are essentially noise (369 of 3521 commits), accounting for more than half of all commits in the last month (303 of 578 commits).

But maybe (probably) the amount of such commits will go down over time, compared to the growing amounts of AI slop

We stopped AI bot spam in our GitHub repo using Git's –author flag

Show HN: Files.md – Open-source alternative to Obsidian

The Quiet Renovation at Bitwarden

1024000^2 Blocks, 2B2T Minecraft Server World Download Project, and Discoveries

Project Glasswing: what Mythos showed us

Voice AI Systems Are Vulnerable to Hidden Audio Attacks

The Aperiodic Table

'We mould trees to grow into the shape of chairs'

What Is Date:Italy?

Linux security mailing list 'almost unmanageable'

Porting my 3D points renderer on a ZX Spectrum 48K

It is time to give up the dualism introduced by the debate on consciousness

GenCAD

When Kierkegaard Got Cancelled

Learn Harness Engineering

Enough with the AI FOMO, go slow-mo, says Domo CDO

Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS

The foundations of a provably secure operating system (PSOS) (1979) [pdf]

Crystals found inside wreckage from the first nuclear bomb test

Math Jokes in Alice in Wonderland

Don't answer the first question

What “Amazon Supply Chain Services” Tells Us About What Amazon Is

Ask an Astronaut: 333 hours of Q&A footage with astronauts

Researchers Wanted Preschool Teachers to Wear Cameras to Train AI

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Jank now has its own custom IR

AI eats the world (Spring 26) [pdf]

Build a Radio Wave Detector with Balls of Aluminum Foil

NASA still maintains some of the Voyager spacecraft code from the 70s era

WriteUp: 16 Bytes of x86 that turn Matrix rain into sound

We stopped AI bot spam in our GitHub repo using Git's –author flag

Comments

We stopped AI bot spam in our GitHub repo using Git's –author flag

Show HN: Files.md – Open-source alternative to Obsidian

The Quiet Renovation at Bitwarden

1024000^2 Blocks, 2B2T Minecraft Server World Download Project, and Discoveries

Project Glasswing: what Mythos showed us

Voice AI Systems Are Vulnerable to Hidden Audio Attacks

The Aperiodic Table

'We mould trees to grow into the shape of chairs'

What Is Date:Italy?

Linux security mailing list 'almost unmanageable'

Porting my 3D points renderer on a ZX Spectrum 48K

It is time to give up the dualism introduced by the debate on consciousness

GenCAD

When Kierkegaard Got Cancelled

Learn Harness Engineering

Enough with the AI FOMO, go slow-mo, says Domo CDO

Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS

The foundations of a provably secure operating system (PSOS) (1979) [pdf]

Crystals found inside wreckage from the first nuclear bomb test

Math Jokes in Alice in Wonderland

Don't answer the first question

What “Amazon Supply Chain Services” Tells Us About What Amazon Is

Ask an Astronaut: 333 hours of Q&A footage with astronauts

Researchers Wanted Preschool Teachers to Wear Cameras to Train AI

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Jank now has its own custom IR

AI eats the world (Spring 26) [pdf]

Build a Radio Wave Detector with Balls of Aluminum Foil

NASA still maintains some of the Voyager spacecraft code from the 70s era

WriteUp: 16 Bytes of x86 that turn Matrix rain into sound