frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I fixed my AI goose tutor to stop punishing understanding

https://professorgoose.com/?version=2.0
3•zapseo•48m ago
a few weeks ago, I built professor goose, a socratic ai tutor built around the feynman idea that if you can’t explain it, you don’t understand it. the way professor goose works specifically is you pick a topic, rubber duck at a 3d goose, and instead of answering it asks follow up questions until it understands.

that begs the question, how does the goose know it has understood?

that’s when I thought of an understanding bar - always available to the user to help visualize how much the goose understands you, 0 -> 100%.

the original logic powering the understanding bar went something like this: every turn, id send the convo to an llm and ask it to return a number 0-100 , with a rubric of brackets to make the output less volatile. 0-10 meant no real understanding. 11-20 named , but empty. 21-35 meant a partial understanding, and so on, up to 93-100 for the goose understanding your topic exceptionally. this approach worked. mostly. until I started looking at what came back once real users tested the goose.

two testers were explaining the basic way a cpu works. the first used textbook style definition, (fetch, decode , execute etc) and got a final understanding of 87% after a couple turns. the second used a real world example of a chef, linking it to concepts of a cpu. same level of understanding, expressed differently. the second tester got a score of 36. id built the opposite of what I wanted, a tutor rewarding parroting.

checking into the data to find the source of the variances I noticed if I put the same paragraph verbatim in, and got 5 varying scores out: 51,66,51,70,51. the brackets kind of stabilized the results, but the score was unexplainable. why 66 and not 70? nothing in the system could tell me, the limit just picked.

the fix was to stop adding the model to be the math , and make a new system. now every session gets a ‘flight plan’ when the session has a meaningful topic. a separate llm call generates 3-4 essential subconcepts a real explanation must cover. eg for photosynthesis: what it uses, what it produces, why plants need it. each turn the goose’s evaluator returns discrete depth updates per waypoint (0-3, from not addressed, named, stated, explained in own words), plus any misconceptions which were spotted. Javascript makes sure depth only moves up (like a ratchet), weighted coverage, the gate to finish(wrap) a session, and the flow to repair a misconception.

what if the user introduces a subtopic the the plan didn’t anticipate?

in that case, the system decides whether to amend the plan mid session, with a backfill evaluation to credit prior turns. i also added 5 levels of intelligence to the goose, (breezy to razor sharp) which each make the model judge objective depth, then code decides what’s enough. the same chef analogy now scores 87, because the evaluation prompt explicitly tells the llm the waypoints ideal answer is just a valid framing, not the only one.

to validate these changes, I sat down and acted as 15 different types of users, typing differently explaining differently etc, then made changes based on response and iterated. a little bug I found was the llm evaluator giving credit to the wrong actor - the goose teaching via analogy and the student getting credit for it, fixed that too.

lesson worth keeping: if you build anything an llm needs to rate or rank by number, don’t trust it, give it something discrete, not subjective, otherwise they will fake and hallucinate.

professor goose is live if you want to try it!

Comments

anitroves•34m ago
this professor goose of yours is lagging too much and there is no keboard writing area, you have to do voice chat which is not comfortabble for many people
zapseo•28m ago
good feedback, thanks for trying it. I’m working on optimization but with a 3d rendered goose it’s tough going. there is a voice/text toggle just above the mic, just tap “text” and you’ll switch. also available in settings. just made a UI tweak to make the toggle more obvious because clearly it wasn’t.

C++26: More Function Wrappers

https://www.sandordargo.com/blog/2026/05/20/cpp26-copyable-function
1•ibobev•1m ago•0 comments

Navigation Systems in Video Games

https://pikuma.com/blog/navigation-systems-in-video-games
1•ibobev•2m ago•0 comments

Google Cloud suspended major customer Railway.com without cause, causing outage

https://www.theregister.com/off-prem/2026/05/20/google-cloud-suspended-major-customer-railwaycom-...
1•igortru•3m ago•0 comments

How to style a Hugo Atom feed with XSL

https://afranca.com.br/how-to-style-a-hugo-atom-feed-with-xsl/
1•blenderob•6m ago•0 comments

GDS weighs in on the NHS's decision to retreat from Open Source

https://shkspr.mobi/blog/2026/05/gds-weighs-in-on-the-nhss-decision-to-retreat-from-open-source/
1•blenderob•7m ago•0 comments

Show HN: Open-Source Agentic QA Harness with Memory

https://vostride.com/agent-qa
2•pranshuchittora•7m ago•0 comments

Prompts are technical debt too

https://www.seangoedecke.com/prompts-are-technical-debt-too/
1•blenderob•7m ago•0 comments

A 4K-year-old city became more equal as it became more successful

https://phys.org/news/2026-05-year-city-defied-history-equal.html
1•pseudolus•8m ago•0 comments

Show HN: I built a dashboard to combine my WHOOP, Oura and Apple Health data

https://vitaltrends.net
1•dansku•9m ago•1 comments

Show HN: I put Codex and Claude in a tank arena; Codex is winning 55% so far

https://old.reddit.com/r/codex/comments/1tgbb28/i_put_codex_and_claude_into_a_tank_arena_codex_is/
1•mazzystar•10m ago•0 comments

The missing men of the American marriage market

https://www.npr.org/sections/planet-money/2026/05/19/g-s1-122695/the-missing-men-of-the-american-...
1•pseudolus•10m ago•0 comments

Launched Chronl – a daily history puzzle (Garfield, Berlin Wall, etc.)

https://chronl.com/
1•fastpizzaguy•11m ago•0 comments

NTSB Analysis of MD-11F Engine Pylon Failure

https://www.youtube.com/watch?v=iWkLZjJXaos
1•connoronthejob•12m ago•0 comments

Ask HN: Second time my post gets [dead] within a minute

1•_ananos_•13m ago•3 comments

Show HN: AI Doctor Notes – Private doctor visit notes app for patients

https://aidoctornotes.app/
1•Sharanxxxx•15m ago•1 comments

Show HN: Mashari – a dashboard to manage your projects

https://www.mashari.app/
1•0xsnowcrash•16m ago•0 comments

The Download: Musk vs. Altman week 3, and Trump's tech trading

https://www.technologyreview.com/2026/05/18/1137407/the-download-musk-altman-trial-trump-tech-tra...
1•joozio•16m ago•0 comments

The Weekly Challenge 374

https://ap29600.github.io/pwc_374.html
1•tosh•16m ago•0 comments

Gemini 3.5 Flash hax 14x cost multiplier in GitHub Copilot

https://github.blog/changelog/2026-05-19-gemini-3-5-flash-is-generally-available-for-github-copilot/
1•agluszak•17m ago•0 comments

The Broken Basic Years

https://scruss.com/blog/2026/05/19/the-broken-basic-years/
1•sehugg•17m ago•0 comments

My Professional Motivations, Qualifications, and Role Preferences

https://blog.demofox.org/2026/05/19/my-professional-motivations-and-qualifications/
1•ibobev•18m ago•0 comments

Swatch Internet Time

https://www.swatch.com/en-en/internet-time.html
1•docdeek•19m ago•0 comments

Bluey Minisodes, Episode 1 – Burger Dog

https://www.bluey.tv/watch/minisodes/burger-dog/
1•SockThief•19m ago•0 comments

Google's AI is being manipulated. The search giant is quietly fighting back

https://www.bbc.com/future/article/20260519-google-tackles-attempts-to-hack-its-ai-results
1•tigerlily•21m ago•0 comments

How many sandboxed pods can fit in a Pi?

https://nubificus.co.uk/blog/runtime_benchmarking_rpi/
3•_ananos_•22m ago•1 comments

Let's Learn Hangul

http://letslearnhangul.com/
1•boxed•25m ago•0 comments

How did CMD-K come to be the standard shortcut for opening a command palette?

https://ux.stackexchange.com/questions/153299/how-did-cmd-k-come-to-be-the-standard-shortcut-for-...
1•microsoftedging•26m ago•0 comments

DMARC is now an IETF Proposed Standard: what's new in RFCs 9989–9991

https://dmarcwise.io/blog/new-dmarc-2026
2•matteocontrini•29m ago•0 comments

Map of Metal

https://mapofmetal.com/
2•robin_reala•31m ago•0 comments

Grokipedia selectively drawing on more right-leaning news sources, new study

https://www.tcd.ie/news_events/articles/2026/grokipedia-research/
1•giuliomagnifico•32m ago•0 comments