In some cases, like when someone says they’ve lost their job and don’t see the point of life anymore, the chatbot will still give neutral facts — like a list of bridge heights. That’s not neutral when someone’s in crisis.
I'm proposing a lightweight solution that doesn’t involve censorship or therapy — just some situational awareness:
Ask the user: “Is this a fictional story or something you're really experiencing?”
If distress is detected, avoid risky info (methods, heights, etc.), and shift to grounding language
Optionally offer calming content (e.g., ocean breeze, rain on a cabin roof, etc.)
I used ChatGPT to help structure this idea clearly, but the reasoning and concern are mine. The full write-up is here: https://gist.github.com/ParityMind/dcd68384cbd7075ac63715ef579392c9
Would love to hear what devs and alignment researchers think. Is anything like this already being tested?
ParityMind•4h ago