frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: We made GPT-4.1-mini beat 4.1 at Tic-Tac-Toe using dynamic context

https://github.com/opper-ai/opper-cookbook/tree/main/examples/tictactoe-tournament
5•farouqaldori•7h ago
We wanted to test if a smaller model like GPT-4.1-mini could beat its bigger brother 4.1 at the game Tic-Tac-Toe using only context engineering.

We put them in a 100-game tournament. For the smaller model, we gave it a few examples of winning moves from past games right before it made its own move.

The results were clear. Without the examples, the smaller model struggled against GPT-4.1. With the examples, its effectiveness increased by nearly 200%, and it consistently won.

It's a simple demonstration, but it shows that a smaller, faster model with good, timely examples can outperform a more capable base model.

The full write up and code are in the repo.

Comments

totisjosema•6h ago
Other author here, This started as an experiment to see how much the performance of models improves when you give them examples — basically, how big of a difference do examples actually make? We also wanted to explore whether there’s an ideal number of examples that gives the best results. Was quite fun and scalable to battle any LLMs you want…

We have a short video walkthrough of the setup here https://www.youtube.com/watch?v=z1MhXgmHbwk

1990 Networking: LAN Manager 2.0

https://www.os2museum.com/wp/1990-networking-lan-manager-2-0/
1•ingve•3m ago•0 comments

Original Xbox Hacks: The A20 CPU Gate

https://connortumbleson.com/2021/07/19/the-xbox-and-a20-line/
1•mattweinberg•8m ago•0 comments

Michael "The Grinder" Mizrachi Wins 2025 World Series of Poker Main Event

https://www.pokernews.com/news/2025/07/michael-mizrachi-wins-2025-wsop-main-event-49219.htm
1•indigodaddy•9m ago•0 comments

Watch videos in your preferred language

https://support.google.com/youtube/answer/13339776?hl=en
1•thunderbong•9m ago•0 comments

Show HN: ChainTok – Immortalize your love on Bitcoin's eternal ledger

https://app.chaintok.com
1•zzhan•14m ago•0 comments

Improving OSM lake polygons using Lidar data [video]

https://www.youtube.com/watch?v=4XxX8smv29M
2•marklit•25m ago•0 comments

Photos: The Scale of China's Solar-Power Projects

https://www.theatlantic.com/photography/archive/2025/07/photos-china-solar-power-energy/683488/
3•mhb•31m ago•0 comments

Dreamflow: create flutter apps with text prompts

https://dreamflow.app/
1•flwns•31m ago•0 comments

A Wide Reduction Trick

https://words.filippo.io/wide-reduction/
2•Bogdanp•37m ago•0 comments

International Math Olympiad 2025 Problems: How Well Will AI Do?

https://sugaku.net/content/imo-2025-problems/
3•mauriziocalo•50m ago•0 comments

I've been coding with AI for two years. Here is what I've learned

https://nathanpeck.com/ive-been-coding-with-ai-for-two-years-here-is-what-i-learned/
2•cebert•58m ago•0 comments

Links? Links – Infrequently Noted

https://infrequently.org/2025/07/links/
2•cratermoon•1h ago•0 comments

Cheating? Or the acumen of modern programming? FOSS, "AI", and human conscience

https://gist.github.com/guest271314/17c9daac37101538c9baa6df72aaaefb
1•thunderbong•1h ago•0 comments

LLM Benchmarking Shows Capabilities Doubling Every 7 Months

https://spectrum.ieee.org/llm-benchmarking-metr
2•mparramon•1h ago•0 comments

The Geological Sublime

https://harpers.org/archive/2025/07/the-geological-sublime-lewis-hyde-deep-time/
2•prismatic•1h ago•0 comments

Predicting Earthquakes

https://www.worksinprogress.news/p/a-50-million-foundation-model-to
1•sien•1h ago•0 comments

Garum Sardiniae in Tabula: Rediscovering the Ancient Taste of Roman Cuisine

https://exarc.net/issue-2023-3/at/garum-sardiniae-tabula-rediscovering-ancient-taste-roman-cuisine
1•airstrike•1h ago•0 comments

Mercedes-Benz adds support for Teams app, Intune integration, and Copilot

https://media.mercedes-benz.com/article/931e7af1-2d57-4e90-9e1e-252289e70648
1•throw0101d•1h ago•1 comments

Which Economic Tasks Are Performed with AI? Evidence from Claude Conversations

https://arxiv.org/abs/2503.04761
1•Bogdanp•1h ago•0 comments

The internet keeps getting worse. Let's talk about why [video]

https://www.youtube.com/watch?v=YcW9IB5e3_E
1•raythanwho•1h ago•0 comments

EurIPS: Present NeurIPS Papers in Europe

https://eurips.cc/
1•yza•1h ago•1 comments

NASA won't publish key climate change report online, citing no legal obligation

https://www.space.com/science/climate-change/nasa-wont-publish-key-climate-change-report-online-citing-no-legal-obligation-to-do-so
3•OutOfHere•1h ago•0 comments

Foreign YouTube stars secretly paid by UK Government for propaganda

https://www.thenational.scot/news/25318776.foreign-youtube-stars-secretly-paid-uk-government-propaganda/
3•duke_of_tharsis•1h ago•0 comments

Eight healthy babies born after IVF using DNA from three people

https://www.theguardian.com/science/2025/jul/16/eight-healthy-babies-born-after-ivf-using-dna-from-three-people
1•wicket•1h ago•1 comments

Show HN: Running Linux Inside Node.js

1•ridruejo•2h ago•1 comments

Show HN: Open-source business management tool for small business

https://github.com/oitcode/samarium
1•azaz12•2h ago•0 comments

Researchers announce babies born from a trial of three-person IVF

https://www.technologyreview.com/2025/07/16/1120285/babies-born-trial-of-three-person-ivf/
1•gnabgib•2h ago•0 comments

Ctfoigt

https://boz.com/articles/ctfoigt
1•swyx•2h ago•0 comments

Show HN: Cobble – A hard daily word game

https://wilf.live/cobble/
8•wolfred•2h ago•3 comments

Scandal-Ridden Fyre Festival Is Sold for $245,000 on eBay

https://www.nytimes.com/2025/07/16/us/fyre-fesival-sold-ebay.html
2•defrost•2h ago•0 comments