We hit a SIGSEGV where the backtrace was misleading: our fatal handler tried to print a stack trace and that trace capture sometimes crashed inside libunwind, so it looked like “unwinding is broken”.
What worked was building a deterministic postmortem harness (core dump + debug binary + symbols + matching source paths) inside Docker, then installing Codex in the same container so it could run GDB + rebuild/iterate in-place.
OpenAI Codex pivoted away from unstable backtraces and classified the crash via siginfo_t/ucontext_t. It turned out to be SEGV_PKUERR (Intel MPK/PKU) caused by a thread-local PKRU mismatch when some worker threads entered V8.
Sep142324•1h ago
We hit a SIGSEGV where the backtrace was misleading: our fatal handler tried to print a stack trace and that trace capture sometimes crashed inside libunwind, so it looked like “unwinding is broken”.
What worked was building a deterministic postmortem harness (core dump + debug binary + symbols + matching source paths) inside Docker, then installing Codex in the same container so it could run GDB + rebuild/iterate in-place.
OpenAI Codex pivoted away from unstable backtraces and classified the crash via siginfo_t/ucontext_t. It turned out to be SEGV_PKUERR (Intel MPK/PKU) caused by a thread-local PKRU mismatch when some worker threads entered V8.
PR with the patch: https://github.com/timeplus-io/proton/pull/1091