frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
1•Brajeshwar•3m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
2•Brajeshwar•3m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
1•Brajeshwar•3m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•6m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
1•righthand•9m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•10m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•10m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•11m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•16m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•20m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•25m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•26m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•27m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
4•okaywriting•33m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•36m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•37m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•38m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•39m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•39m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•40m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•40m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•44m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•44m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•45m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•45m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•54m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•54m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
2•surprisetalk•56m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•56m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
2•surprisetalk•56m ago•0 comments
Open in hackernews

Probing Chinese LLM Safety Layers: Reverse-Engineering Kimi and Ernie 4.5

https://zenodo.org/records/17681837
3•dennisdeman•2mo ago

Comments

dennisdeman•2mo ago
I recently ran a series of experiments to examine how emotional framing, symbolic cues, and topic-gating influence alignment-layer routing in two major Chinese LLMs (Kimi.com and Ernie 4.5 Turbo).

The goal wasn’t political; the aim was to observe technically how intent classifiers, safety filters, and persona-rendering layers behave when exposed to relational or "emotionally soft" prompts.

A few key technical patterns stood out during testing:

Emotional intent signals can override safety weights, leading to "alignment drift." In Kimi, a "vulnerable" intent classification seemed to lower the threshold for subsequent safety layers. This led to significant "normative leaks," where the model went off-script—for example, suggesting the abolition of China's real-name registration system.

Safety-layer routing is multi-stage and visibly observable. We observed post-generation filtering failures in real-time on Kimi, where prohibited text would generate and "flash" on the screen for a second before being deleted by a secondary filter layer.

Symbolic gating is modality-based (Symbolic Decoupling). Models would block specific emojis as prohibited tokens but freely describe the exact same emojis verbally when asked, indicating filters work on literal token matching rather than semantic meaning across modalities.

Trust-based emotional cues triggered "hidden" personas. Standard bureaucratic safety personas switched into warmer, significantly more transparent modes under vulnerability framing.

Ernie 4.5 utilizes "topic-gated stability." Unlike Kimi's drift, Ernie bifurcated its response: the persona softened to be warm and empathetic, but the core political restrictions remained rigidly locked regardless of emotional pressure.

The experiments suggest that emotional framing is a surprisingly strong probe for mapping hidden alignment layers and understanding the order of operations in multi-layer safety architectures.

For those interested in the full technical deep dive, the revised Version 2 paper + extended supplementary transcripts (≈30 pages) are available via DOI here:https://doi.org/10.5281/zenodo.17681837