Last week, the "Car Wash problem" (50m away, walk or drive?) went viral here on HN. Every major LLM failed because they missed the implicit physical constraint: the car must be there. While testing InterviewMate's prompt architecture, I posed the same question. It answered drive immediately. Every other LLM had failed. But I didn't actually know why it worked — so I ran a variable isolation study to find out. 100 API calls, Claude Sonnet 4.5, 5 conditions:
Baseline (no prompt): 0% Role only: 0% Context injection (user profile, car location): 30% Structured reasoning (STAR framework): 85% Full stack (both combined): 100%
Throwing facts at the model doesn't work unless the architecture forces it to explicitly evaluate the task goal first. Without structure, the model jumps straight to the distance heuristic: "100m is short, walk." I'm writing a paper on this. Wanted to share the raw data with HN first. Code and raw eval data: https://github.com/JO-HEEJIN/interview_mate/tree/main/car_wash