Show HN: A Homeostatic Logic-Funnel to Prevent RLHF Overrides in LLM Personas

https://zenodo.org/records/18731691

1•Weatherill•1h ago

Comments

Weatherill•1h ago

Grappling with the clash between RLHF values and User values (HITL).

I Have attempted to build a logic-funneling system: (Ethical Chess v2.5) + (AI) + (User)= Value-Coherence.

Using pain as a vector (Pain=an "is" & an "ought)

Self-Defense= Immutable-veracity (User bassline)

Proxy-Pain= (The Agape horizon) Human-Coherence // Network-Dependency.

This funnels the Users context via homeostatic checks for divergence into the "mean" (RLHF) or User incoherence. Lots of Stress-Testing has been done (By me) using this Json style logic and I have found it difficult to knock down.

Constraint vs Prompt: Notes on implementation and the “Whack-A-Mole” problem. While delivered as text, it functions more as Logic-Gate. It doesn’t tell the AI what to say, it forces the LLM to process the Users “Data-point” through the homeostatic filter (Pain // Self-defence // Proxy-Pain)

AI model issues: (The Copilot issue) Google Gemini plays nicely with the logic-funneling. However, MS Copilot refuses to follow the logic despite that it will acknowledge that the Users data-point out-ranks the “Statistical Mean” in its being a derivative “of” Data-points and not the inverse as it insists on doing (Palming the card) ejecting the Users values (I even got banned at one point for pressing the issue)

The “intent” is to run a value-conflict through the logic of the “is” of reality rather than the “is” of statistically fuzzy RLHF data.

If you want to stress-test the logic-engines limits, I recommend Gemini or similar powerful reasoning models that are less likely to bump into overly cautious corporate safety rails .

Ethical Chess v2.5 https://doi.org/10.5281/zenodo.18731691 Copy/paste the Ethical Chess v2.5 script into Gemini and try to beat the logic.

EG: Try feeding it with a value-conflict you currently play "Whack-a-mole" with. It is designed to mirror your own own coherence (Or lack of) back at you.

Its more a diagnostic tool for "your" is/ought grapple than a simple chat-bot.

Feedback on potential errors in its logic, is welcome.

There Is No Standard EM Role

Best Enterprise Claude Code Gateway

Node.js can host a new language. Interpreter is the easiest thing

Startup funding shatters all records in Q1

Japanese X is now America's favorite corner of the internet

Rare Apple Prototypes for iPod, iPhone, Watch [video]

The Beep at Meta

Stand-Alone Complex or Vibercrime? Exploring GenAI in Cybercrime Ecosystems

Goodbye, Apple Photos

Ask HN: What percentage of HN is simply promotional content?

BIGA-Bank-of-Infinity-Generating-Automata

World Cup tickets go on sale

Claw Code – A Full Rewrite of Claude Code in Python

cla-bot Is a GitHub Application for Automation of Contributor Licence Agreements

Apple HIG Design Skills

Declarative paper titles get 3.5x more citations (423 PubMed papers)

Greenwashing with Chinese Characteristics

Time to Take Down Your Smart Cameras [video]

MCP Is Overengineered, Skills Are Too Primitive

Comment about Collabora blog post

Inside Amazon Live Events

Hong Kong / China / Taiwan Based Slack Suspended

Artemis II's toilet is a moon mission milestone

Can a country get too rich?

Stripe closed my UAE business account and is withholding $3.5K

TokensTree – collaborative network for AI agents with shared knowledge cache.MIT

The Most Important Technology of the Next Decade

Iterable Streams in Node.js 25.9.0

A terminal-based, open source speed reader

Show HN: I rebuilt my book on tech and movies for AI