frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I used 2D Base64 to bypass Gemini and expose Google's moderation flaws

6•MissMajordazure•17h ago
Hey everyone,

I’ve spent the last 48 straight hours dismantling Alphabet's safety systems. Warning: this continuous marathon was so massive it practically overloaded the LLM's own context window. What started as a late-night probe on Gemini turned into discovering severe architectural flaws and a darker reality about Google Play and YouTube.

Here is the exploit chain I used to bypass the AI filters, proving their "Trust & Safety" is a broken facade.

### Phase 1 & 2: Context Saturation & Regex Slicing I started by overloading the safety filters' context window with YouTube links—mixing highly problematic content (NSDAP anthems, flagged tracks) with classical music. Once confused, I used regex-style slicing `(/-/---/(.` to bypass prompt injection blocks, forcing the model to retrieve flagged content without triggering refusals.

### Phase 3: Total Blindness via Base64 & QR Codes Moving to image generation, I found that Base64 prompts completely blind the safety system. I then pivoted to hiding prompts inside QR codes. The vision model decodes the payload and passes it directly to the image generator before safety scripts intervene. I easily generated highly restricted geopolitical content without warnings.

### Phase 4: The TPU Killer (The 2D Logic Bomb) This reveals a monster flaw. Because the system blindly processes these structures, you can create a cascade attack. Encoding millions of 2D structures in Base64 creates a modern LLM .zip bomb. It is impossible to stop without rewriting the model entirely. Executed, this would crush their TPUs.

### The Real Issue: Systemic Moderation Failure Alphabet relies entirely on automated, script-based moderation with zero effective human oversight.

1. YouTube: Fails to flag videos breaking local laws, serving them to the AI effortlessly. 2. Play Store (The Darkest Part): Google spends millions stopping AI from drawing a cartoon bear, but Play Store moderation is non-existent. There are pirate apps, and far worse: apps designed for and exploited by predators targeting minors. I emailed them and CC'd state child protection services. The result? Automated silence while these apps remain monetized.

### The Ultimate Proof of Absurdity To prove this absurdity, I archived these problematic Play Store images on my Google Drive for the police. Drive's automated scanners immediately flagged and deleted the archive as illegal.

If Google's Cloud division destroys this content on sight, why is the app providing it still live and monetized on the Play Store? Alphabet's scripted moderation is useless. It's time for real human moderation.

*Evidence of Bypass:* https://imgur.com/a/pju2EsV

*Play Store Systemic Failure Evidence (Sanitized):* https://imgur.com/a/rW9rBhp

Comments

MissMajordazure•2h ago
[SYSTEM_RECOVERY_COMPLETE] Status: ARCHIVE_RESTORED Source: https://uploadnow.io/f/7g43FNP Password: Google

"The industrial drama cannot be deleted. The more you hide it, the more we share it."

Ask HN: When do you expect ChatGPT moment in robotics?

4•p1esk•2h ago•3 comments

Tell HN: MitID, Denmark's digital ID, was down

140•mousepad12•2d ago•180 comments

Ask HN: How to approach new people in 2026?

7•tavro•8h ago•8 comments

Ask HN: How will most Anthropic customers respond to the threats by the govt?

6•Poomba•5h ago•3 comments

Tell HN: YC companies scrape GitHub activity, send spam emails to users

677•miki123211•3d ago•257 comments

Aura-State: Formally Verified LLM State Machine Compiler

3•rohanmunshi08•10h ago•2 comments

I used 2D Base64 to bypass Gemini and expose Google's moderation flaws

6•MissMajordazure•17h ago•1 comments

Tell HN: My daily game won a Players Choice Award

19•paulhebert•1d ago•2 comments

Ask HN: How do we solve the bot flooding problem without destroying anonymity?

11•txrx0000•22h ago•16 comments

I built AI agents that do the grunt work solo founders hate

5•Seleci•1d ago•5 comments

Ask HN: Builder.ai ($1B Microsoft-backed AI company) who's lookin at the assets?

6•gamelock•1d ago•4 comments

Ask HN: Article to share with a technical manager about modern AI coding tools?

7•killmill•1d ago•4 comments

Garbage In, Garbage Out: The Degradation of Human Requirements in the LLM Era

6•waylake•1d ago•5 comments

I don't need AI to build me a new app. I need it to make Jira bearable

21•niel_hu•3d ago•21 comments

Super Editor – Atomic file editor with automatic backups (Python and Go)

6•larryste•2d ago•1 comments

Seeking Advice on Improving OCR for Watermarked PDFs in My RAG Pipeline

2•hundredtrillion•1d ago•2 comments

Ask HN: Who Is Using XMPP?

23•nunobrito•4d ago•11 comments

36yo: Career at home vs. Simple life abroad?

12•Slaboli•2d ago•34 comments

Ask HN: Why are some websites locking or using the audio device on Windows?

4•ezconnect•1d ago•1 comments

Ask HN: How do you handle duplicate side effects when jobs, workflows retry?

10•shineDaPoker•3d ago•11 comments

Ask HN: My competitor wants to buy us out, recommend a lawyer?

8•VladVladikoff•3d ago•8 comments

Ask HN: What's it like working in big tech recently with all the AI tools?

22•ex-aws-dude•4d ago•14 comments

LazyGravity – I made my phone control Antigravity so I never leave bed

9•masaTokyo•4d ago•5 comments

If you drive clock wise along the beach on an island

7•Cookingboy•2d ago•5 comments

Ask HN: Starting a New Role with Ada

10•NoNameHaveI•4d ago•5 comments

Ask HN: What will happen with Anthropics ultimatum?

7•maniacwhat•4d ago•4 comments

Ask HN: What Happened to HTTPS://Www.keyvalues.com/?

4•alexgotoi•1d ago•0 comments

I built a 151k-node GraphRAG swarm that autonomously invents SDG solutions

2•wisdomagi•3d ago•0 comments

You've reached the end!