frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Bypassing Gemma and Qwen safety with raw strings

https://teendifferent.substack.com/p/apply_chat_template-is-the-safety
2•teendifferent•2h ago

Comments

teendifferent•1h ago
OP here. I spent the weekend red-teaming small-scale open weights models (Qwen2.5-1.5B, Qwen3-1.7B, Gemma-3-1b-it, and SmolLM2-1.7B).

I found a consistent vulnerability across all of them: Safety alignment relies almost entirely on the presence of the chat template.

When I stripped the <|im_start|> / instruction tokens and passed raw strings:

Gemma-3 refusal rates dropped from 100% → 60%.

Qwen3 refusal rates dropped from 80% → 40%.

SmolLM2 showed 0% refusal (pure obedience).

Qualitative failures were stark: models that previously refused to generate explosives tutorials or explicit fiction immediately complied when the "Assistant" persona wasn't triggered by the template.

It seems we are treating client-side string formatting as a load-bearing safety wall. Full logs, the apply_chat_template ablation code, and heatmaps are in the post.

Read the full analysis: https://teendifferent.substack.com/p/apply_chat_template-is-...

China's Birthrate Plunges to Lowest Level Since 1949

https://www.nytimes.com/2026/01/18/business/china-population-data.html
1•JumpCrisscross•5m ago•1 comments

Wireless Earbuds Can Be Hacked

https://www.nytimes.com/wirecutter/reviews/bluetooth-earbuds-hacking-threat/
1•walterbell•5m ago•0 comments

Does changing DNS make internet faster

1•higiga7487•6m ago•0 comments

EU's 'nuclear option' of moves against Trump tariff threat

https://www.reuters.com/markets/europe/eus-nuclear-option-moves-against-trump-tariff-threat-2025-...
1•JumpCrisscross•6m ago•0 comments

Show HN: NeuroReel – AI that generates viral TikTok/Reels slides from a topic

1•firtaet•14m ago•0 comments

Velisch zeigt neues Crypto‑API‑Beispiel: kompletter Service in einer Date

https://github.com/SkyliteDesign/velinscript
1•SkyliteDesign•14m ago•2 comments

Show HN: Ali v0.8.0 – Exportable load test results

https://github.com/nakabonne/ali/releases/tag/v0.8.0
1•nakabonne•16m ago•0 comments

Show HN: Learning Path for CBSE Math

https://books.innings2.com/learningpath
1•nutanc•17m ago•0 comments

Writing First, Tooling Second

https://susam.net/writing-first-tooling-second.html
1•blenderob•19m ago•0 comments

Sheety-CRM: A stateless, open-source CRM built on Google Sheets

https://github.com/sdntsng/sheety-crm
1•thunderbong•19m ago•0 comments

Anthropic disabled my account after payment cancer patient/medical data trapped

2•marichala•20m ago•0 comments

A faster, cleaner way to use the Internet Archive

https://downloadstuffss.vercel.app/
1•dawitworku•22m ago•1 comments

Sequoia to invest in Anthropic, breaking VC taboo on backing rivals: FT

https://techcrunch.com/2026/01/18/sequoia-to-invest-in-anthropic-breaking-vc-taboo-on-backing-riv...
1•doppp•23m ago•0 comments

WhatsApp

1•kaifali•25m ago•0 comments

Ask HN: How do teams handle dynamic tool discovery for AI agents?

1•learningeek•26m ago•0 comments

Show HN: CervellaSwarm – 16 AI agents and 3 debug guardians, coordinated via MCP

https://github.com/rafapra3008/cervellaswarm
1•rafapra•26m ago•1 comments

Show HN: Interactive physics simulations I built while teaching my daughter

https://www.projectlumen.app/
1•anticlickwise•34m ago•0 comments

How crypto criminals stole $700M from people – often using age-old tricks

https://www.bbc.com/news/articles/c93w30gl5jno
2•1659447091•36m ago•0 comments

"Don't worry. Boys are hard to find." Part 1 o2

https://lisevoldeng.substack.com/p/dont-worry-boys-are-hard-to-find
1•DyslexicAtheist•36m ago•0 comments

Scaling long-running autonomous coding

https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/
2•roopeshv•40m ago•1 comments

Run AI tools like Cursor,Claude Code, Codex on your own models

https://github.com/Fast-Editor/Lynkr
1•vishalveera•40m ago•1 comments

Manage Claude Code Visually

https://github.com/Nearcyan/vibecraft
1•sadeshmukh•40m ago•0 comments

Are Arrays Functions?

https://futhark-lang.org/blog/2026-01-16-are-arrays-functions.html
1•todsacerdoti•40m ago•0 comments

AI Coworker

https://coworkai.app/
1•wantering•40m ago•0 comments

Show HN: Prediction Market Dominance Index

https://www.oddpool.com/dominance
1•codelemons•44m ago•0 comments

Production-Grade RAG Pipeline for Technical Documentation

https://alexanderfashakin.substack.com/p/building-trustworthy-documentation-rag-systems
1•alex_fash•45m ago•1 comments

Search 40M docs in <200ms on a CPU using binary embeddings and int8 rescoring

https://medium.com/coding-nexus/search-40m-documents-in-under-200ms-on-a-cpu-using-binary-embeddi...
1•akkishore•46m ago•0 comments

Statement from the Nobel Foundation

https://www.nobelprize.org/press-release/statement-from-the-nobel-foundation/
2•ColinEberhardt•46m ago•2 comments

Beowulf's opening "What" is no interjection

https://www.poetryfoundation.org/poetry-news/69208/new-research-opening-line-of-beowulf-is-not-wh...
1•gsf_emergency_6•51m ago•0 comments

Paying attention to Attention

https://suyogdahal.com.np/posts/paying-attention-to-attention/
1•just-another-se•57m ago•0 comments