Recently, I’ve interviewed for a handful of “AI Engineer” positions at several startups and I noticed a shift in the format of technical assessments. Timed OAs and live leetcoding have been replaced with a “case study” format where AI use is encouraged. These were the two main patterns I saw:
1. Take home: Candidate clones a github repo or receives a zip file with starter code and README. They complete the assignment according to the instructions using any tools or resources that they would like, the final code gets pushed up to a github repo and the user submits a link to the repo. The hiring team evaluates the submission.
2. Live assessment: Candidate is live on a call with an interviewer with screenshare. Candidate clones a github repo or receives a zip file with starter code and README instructions. The interviewer observes the candidate think out loud to assess how they solve the problem using AI.
Both of these formats still seem sub-optimal. Reviewing a submitted take-home solution involves the HM sifting through a codebase that is entirely AI generated and reveals little about the candidate’s thought process or problem solving ability. Live “vibe” assessment takes a whole hour of time from the interviewer (which was often the CTO) per candidate.
Moreover they are throwing away the most valuable piece of info: the claude code session log.
I built Gonfire, which consists of a proxy which records and analyzes a candidate’s claude code interactions while solving the assessment and displays a digestible report to a hiring manager. *I’ve refrained from deriving any quantitative metrics of performance until I feel confident that there is a solid basis for any such metric, so the analysis is primarily qualitative for now.
I took an assessment myself, you can view my results in the demo.
Live demo: https://app.gonfire.io (showhn@gonfire.io / Aa123123123123)
Relevant post from Anthropic: <https://www.anthropic.com/engineering/AI-resistant-technical...>
This could allow for some interesting directions in the future:
- “Anti-Spoiler” - Prevent LLMs from spoiling key problem insights/ideation
- Clustering candidates based on distinguishing features of their thinking process