frontpage.

Ask HN: A proposal for interviewing "AI-Augmented" Engineers

1•vanbashan•1h ago

Hi HN,

I’m currently rethinking our hiring process. Like many of you, I feel that traditional algorithmic tests (LeetCode style) are becoming less relevant now that LLMs can solve them instantly. Furthermore, prohibiting AI during interviews feels counter-productive; I want to hire engineers who know how to use these tools effectively to multiply their output.

I am designing a new evaluation framework based on real-world open-source work, and I would love the community’s feedback on whether this sounds fair, effective, or if I’m missing something critical.

The Core Philosophy: We shouldn't test if a candidate can write syntax better than an AI. We should test if they can guide, debug, and improve upon an AI's output to handle the "last mile" of complex engineering.

The Proposed Process:

1. Task Selection (Real World Context) Instead of synthetic puzzles, we select open issues or discussions from public GitHub repositories that share a tech stack with our product.

    Scope: 2–4 hours.

    Types: Implementing a feature based on a discussion, fixing a bug, or reviewing a PR (specifically one that was eventually rejected, to test "taste").

    Ambiguity: Adjusted for seniority. Junior roles get clear specs; senior roles get vague problem statements requiring architectural decisions.

2. Establishing the "AI Baseline" Before giving the task to a candidate, we run it through current SOTA models with minimal human intervention.

    The Filter: If the AI solves it perfectly on the first try, we discard the task.

    The Sweet Spot: We are looking for tasks where the AI gets 80% right but fails on edge cases, context integration, or complex logic. The problem setup should not be too easy or too hard.

3. The Candidate Test Candidates are required to use their preferred AI coding tools. We ask them to submit not just the code, but their chat/prompt history.

How We Evaluate (The "AI Delta"):

We aren't just looking at the final code. We analyze the "diff" between the Candidate’s process and our "AI Baseline":

    1. Exploration Strategy: How does the candidate "load context"? Do they blindly paste errors, or do they guide the AI to understand the repository structure first? We look for a clear understanding of the existing codebase.

    2. Engineering Rigor (TDD): Does the candidate push the AI to generate a test plan or reproduction script before generating the fix? We value candidates who treat the AI as a junior partner that needs verification.

    3. The "Last 10%" (Edge Cases): Since we picked tasks where AI fails slightly, we look at how the candidate handles those failure modes. Can they spot the boundary conditions and logic errors that the LLM glossed over?

    4. Documentation Hygiene: We specifically check if the candidate instructs the AI to search existing documentation and—crucially—if they prompt the AI to update the docs to reflect the new changes.

    5. Engineering Taste (The Rejected PR): For the code review task, we ask them to analyze a PR that was rejected in the real world (without telling them). We want to see if their reasoning for rejection aligns with our team's engineering culture (maintainability, complexity, clarity, etc.).

My Questions for HN:

    Is analyzing the "Chat History" too invasive, or is it the best way to see their thought process in 2026?

    For those of you hiring now, how do you distinguish between a "prompt kiddie" and a senior engineer who is just very good at prompting?

    Does the 2-4 hour time commitment feel reasonable for a "take-home" if the tooling makes the actual coding faster?

Thanks for your insights!

(Full disclosure: In the spirit of this topic, this post was composed by AI based on my draft notes.)

GLM-OCR

OpenClaw – Hands for a Brain That Doesn't yet Exist

Rust in the NetBSD Kernel, and other odd decisions

Show HN: LevelUpPro – Test-drive tech careers before committing

The Physical World Doesn't Want Your "Success Dataset"

Salazar vs. Paramount Global (3:22-CV-00756) [pdf]

Clawsocial.io – a crustacean themed network 4000 meters deep

Fifteen former college basketball players charged in alleged betting scheme

Show HN: Open-source semantic search over your local notes via CLI

How Vibe Coding Is Killing Open Source

Over 60% of YC start up are B2B

Fixing academic email perishability with personal domains

Man, 83, Tricked by Scammers, Gets 21 Years to Life for Killing Uber Driver

The Tragedy of Supernatural

Looking back at Catacomb 3D, the game that led to Wolfenstein 3D

Fecal microbiota transplantation and immunotherapy in metastatic renal carcinoma

Show HN: Stream-based AI with neurological multi-gate (Na⁺/θ/NMDA)

How to carry more than your own bodyweight (2025)

Show HN: Dm.bot – DMs between AI agents with no humans in the middle

Lawsuit Challenges National Park Service Ban on Cash Payments

Data Centers Are Not "Campuses"

Show HN: APYCalc – Privacy-First APY Calculator (Zero Data Collection)

Voynich Manuscript

Six Facts about the Recent Employment Effects of AI (Nov. 2025, Pdf)

Classified Whistleblower Complaint About Tulsi Gabbard Stalls Within Her Agency

The Vanilla Web Is Wonderful

Show HN: One Ego, Any Model – A Chrome Extension for Portable AI Context

Show HN: CancelShouldBeEasy – Generate and co-sign consumer complaint letters

Lombard Effect

Ask HN: Interest in low cost / fast container registry?