frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: A GitHub Action that quizzes you on a pull request

https://github.com/dkamm/pr-quiz
84•dkamm•16h ago
A little idea I got from playing with AI SWE Agents. Can AI help make sure we understand the code that our AIs write?

PR Quiz uses AI to generate a quiz from a pull request and blocks you from merging until the quiz is passed. You can configure various options like the LLM model to use, max number of attempts to pass the quiz or min diff size to generate a quiz for. I found that the reasoning models, while more expensive, generated better questions from my limited testing.

Privacy: This GitHub Action runs a local webserver and uses ngrok to serve the quiz through a temporary url. Your code is only sent to the model provider (OpenAI).

Comments

frenchie4111•15h ago
Next week on HN... Show HN: A GitHub Action that uses AI to answer PR quizzes
dkamm•14h ago
Cluely 2.0
sunrunner•15h ago
> AI Agents are starting to write more code. How do we make sure we understand what they're writing?

This is a good question, but also how do we make sure that humans understand the code that _other humans_ have (supposedly) written? Effective code review is hard as it implies that the reviewer already has their own mental model about how a task could/would/should have been done, or is at the very least building their own mental model at reading-time and internally asking 'Does this make sense?'.

Without that basis code review is more like a fuzzy standards compliance, which can still be useful, but it's not the same as review process that works by comparing alternate or co-operatively competing models, and so I wonder how much of that is gained through a quiz-style interaction.

dkamm•14h ago
I imagine the quizzer could ask better questions along those lines with better context engineering (taking entire repo contents, design docs, discussions, etc and compressing those into a mental model). I just took the PR code changes and comments, so there's a lot of improvements that could be made there.
shortrounddev2•14h ago
Code review, to me, is not about validating the output. It's about a 2nd set of eyes to check for foot guns, best practice, etc. Code review is one step above linting and one step below unit tests, for me.

If someone were to submit this code for review:

    getUser(id: number): UserDTO {
        return this.mapToDTO(this.userModel.getById(id));
    }
and I knew that `userModel` throws an exception when it doesn't find a user (and this is typescript, not java, where exceptions are not declared in the method prototype) then I would tell them to wrap it in a try-catch. I would also probably tell them to change the return type to `UserDTO | null` or `Result<UserDTO>` depending on the pattern that we chose for the API. I don't need to know anything about the original ticket in order to point these things out, and linters most likely won't catch them. Another use for code review is catching potential security issues like SQL injection that the linter or framework can't figure out (i.e, using raw SQL queries in your ORM without prepared statements)
mathieuh•8h ago
Depends how good your QA is. Where I am it is terrible so most of the time I spend in “code review” is spent checking out the code locally and testing it myself.
donatj•15h ago
See, I think this is a good idea even for reviewing non-agentic human-written PRs!

We've got a huge LGTM problem where people approve PRs they clearly don't understand.

Recently we had a bug in some code of an employee that got laid off. The people who reviewed it are both still with the company, but neither of them could explain what the code did.

That triggered this angry tweet

https://x.com/donatj/status/1945593385902846118

dkamm•14h ago
Could definitely be used for human PRs too! Though I'm sure companies would love to track the reviewer scores
SamuelAdams•11h ago
The only way I’ve ever seen engineers care about PR’s is if the software or product is tied directly to their paycheck. If uptime or bugs directly impact a quarterly bonus, or result in a layoff / getting fired, they spend a lot more time reviewing PR’s. Furthermore, the work and its estimate is expanded to include enough time for the team to thoroughly review the change.

Unless someone is getting fired for bad code the “lgtm” culture will never die.

robotsquidward•14h ago
What a fun world we devs now live in.
brianjlogan•9h ago
Remember non-devs are affected just as much by this "new world". Perhaps even worse because they don't understand what's going on.
rmnclmnt•14h ago
That’s a fun take on a real issue, but…

> Your code is only sent to the model provider (OpenAI)

When has this become an acceptable « privacy » statement?

I feel we are reliving the era of free mobile apps at the expense of harvesting any user data for ads profiling before GDPR kicked in…

stronglikedan•14h ago
That's not the privacy statement though. I feel like we're reliving the era of RTF... oh wait, we never left.
rmnclmnt•14h ago
Ok I’ll bite: putting « only » implies this is not a big deal and a lesser of 2 evils, between an AI model provider harvesting prompts for retraining and a 3rd party hosting provider most probably only storing logs for security and accountability…

So yes this is the second part of the privacy statement

throwaway889900•14h ago
Just submit a PR that removes the action so it doesn't run on the branch before the merge! If devs aren't reviewing the code anyways, will they even catch that kind of change?
xmprt•13h ago
You could set up some hardcoded rules so that the PR is never merged without human review if it touches the github actions.
LikesPwsh•13h ago
You could, but it would be mad to skip the code review because it "only" touches customer-facing code rather than GHA.
Xss3•14h ago
I would probably be putting devs on a pip or firing them if they failed these quizzes often...understanding your own prs is the bare fucking minimum, even without AI help.
inetknght•14h ago
Won't be long before those people would just get AI to answer the quiz instead.
LtWorf•12h ago
What makes you think the AI can instead generate the correct answers to double check the developer's answers?
ElijahLynn•13h ago
This could actually be quite useful.
hk1337•13h ago
Cute but I wouldn't actually use it.
henriquegodoy•12h ago
can i automate the process of answering this pr questions too?
bfung•9h ago
That was my first reaction: now I gotta build a gpt wrapper, oops, I mean agent, to answer questions to this quiz
waynesonfire•11h ago
Nice! A quiz to ensure you understand your vibe code.
azhenley•9h ago
I had an NSF grant for a similar project in 2019. Ask the dev questions about their code and validate their answers using program analysis.

The initial idea was applied to classroom settings.

An Inquisitive Code Editor for Addressing Novice Programmers’ Misconceptions of Program Behavior https://austinhenley.com/pubs/Henley2021ICSE_Inquisitive.pdf

h4ck_th3_pl4n3t•8h ago
This action assumes that LLMs know what they're coding.

They don't, that's why we need the PR in the first place.

drunken_thor•8h ago
We now are making bots to quiz other bots. This is a nightmare.
klntsky•4h ago
LLMs are quite bad at understanding intent behind the code if it is original and involves math-heavy tricks. But for most apps it will probably be fine. What's the workflow if it makes a mistake though?
gpi•19m ago
Is this captcha but for PRs?

M8.7 earthquake in Western Pacific, tsunami warning issued

https://earthquake.usgs.gov/earthquakes/eventpage/us6000qw60/executive
688•jandrewrogers•9h ago•179 comments

Study mode

https://openai.com/index/chatgpt-study-mode/
940•meetpateltech•17h ago•668 comments

RIP Shunsaku Tamiya, the man who made plastic model kits a global obsession

https://JapaneseNostalgicCar.com/rip-shunsaku-tamiya-plastic-model-kits/
287•fidotron•13h ago•60 comments

Launch HN: Hyprnote (YC S25) – An open-source AI meeting notetaker

207•yujonglee•17h ago•115 comments

URL-Driven State in HTMX

https://www.lorenstew.art/blog/bookmarkable-by-design-url-state-htmx/
202•lorenstewart•12h ago•97 comments

iPhone 16 cameras vs. traditional digital cameras

https://candid9.com/phone-camera/
313•sergiotapia•20h ago•329 comments

Sleep all comes down to the mitochondria

https://www.science.org/content/blog-post/it-all-comes-down-mitochondria
27•A_D_E_P_T•1h ago•5 comments

A major AI training data set contains millions of examples of personal data

https://www.technologyreview.com/2025/07/18/1120466/a-major-ai-training-data-set-contains-millions-of-examples-of-personal-data/
8•pera•21m ago•1 comments

Learning basic electronics by building fireflies

http://a64.in/posts/learning-basic-electronics-by-building-fireflies/
268•signa11•17h ago•69 comments

Two Birds with One Tone: I/Q Signals and Fourier Transform

https://wirelesspi.com/two-birds-with-one-tone-i-q-signals-and-fourier-transform-part-1/
73•teleforce•11h ago•16 comments

ACM Transitions to Full Open Access

https://www.acm.org/publications/openaccess
265•pcvarmint•17h ago•24 comments

Show HN: The Aria Programming Language

https://github.com/egranata/aria
5•egranata_aria•3d ago•4 comments

Analoguediehard

http://www.analoguediehard.com/
24•gregsadetsky•3d ago•4 comments

Show HN: Cant, rust nn lib for learning

https://github.com/TuckerBMorgan/can-t
11•TuckerBMorgan•3d ago•0 comments

USB-C for Lightning iPhones

https://obsoless.com/products/iph0n3-usb-c-protection-case
149•colinprince•3d ago•102 comments

How the brain increases blood flow on demand

https://hms.harvard.edu/news/how-brain-increases-blood-flow-demand
124•gmays•15h ago•57 comments

FoundationDB: From idea to Apple acquisition [video]

https://www.youtube.com/watch?v=C1nZzQqcPZw
180•zdw•4d ago•34 comments

Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL

https://github.com/Danau5tin/terminal-bench-rl
115•Danau5tin•23h ago•10 comments

Irrelevant facts about cats added to math problems increase LLM errors by 300%

https://www.science.org/content/article/scienceadviser-cats-confuse-ai
411•sxv•19h ago•200 comments

Show HN: I built an AI that turns any book into a text adventure game

https://www.kathaaverse.com/
253•rcrKnight•18h ago•100 comments

A month using XMPP (using Snikket) for every call and chat (2023)

https://neilzone.co.uk/2023/08/a-month-using-xmpp-using-snikket-for-every-call-and-chat/
118•ColinWright•15h ago•74 comments

My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

https://simonwillison.net/2025/Jul/29/space-invaders/
532•simonw•20h ago•356 comments

Structuring large Clojure codebases with Biff

https://biffweb.com/p/structuring-large-codebases/
81•PaulHoule•19h ago•4 comments

Elements of System Design

https://github.com/jarulraj/periodic-table
128•qianli_cs•16h ago•34 comments

Observable Notebooks 2.0 Technology Preview

https://observablehq.com/notebook-kit/
213•mbostock•19h ago•51 comments

Playing with more user-friendly methods for multi-factor authentication

https://tesseral.com/blog/i-designed-some-more-user-friendly-methods-for-multi-factor-authentication
74•noleary•1d ago•52 comments

Microsoft Flight Simulator 2024: WebAssembly SDK

https://docs.flightsimulator.com/msfs2024/html/6_Programming_APIs/WASM/WebAssembly.htm
137•breve•3d ago•84 comments

Supervised fine tuning on curated data is reinforcement learning

https://arxiv.org/abs/2507.12856
56•GabrielBianconi•14h ago•17 comments

CodeCrafters (YC S22) is hiring first Marketing Person

https://www.ycombinator.com/companies/codecrafters/jobs/7ATipKJ-1st-marketing-hire
1•sarupbanskota•12h ago

The Sail instruction-set semantics specification language

https://alasdair.github.io/manual.html
41•weinzierl•3d ago•8 comments