frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Introducing TiānshūBench (天书Bench)

https://jeepytea.github.io/general/introduction/2025/05/29/tianshubenchintro.html
4•chromaton•1d ago

Comments

chiwilliams•1d ago
Cool project! I have a couple of questions that would be nice in the writeup: * How did you generate your example problems? Did you take an existing benchmark? Or did you have LLMs generate the problems? * Do you have any thought to adding a second "base programming language" to alter? I'm not sure that there's enough variation as there is. (Another thought would be to generate 4 or 5 different new languages, each quite different, and then run the benchmark on each of those languages? I'm not sure how much the fact that it is randomly generated each time matters that much?)

But overall, a clever idea!

chromaton•16h ago
Generating the problems: I just thought up a few simple things that the computer might be able to do. In the future, I hope to expand to more complex problems, based upon common business situations: reading CSVs, parsing data, etc. I'll probably add new tests once I get multi-shot and reliability working correctly.

New base programming languages would be great, but what would be even better is some sort of meta-language where many features can be turned on or off, rather than just scrambling the keywords like I do now.

I did some vibe testing with a current frontier model, and it gets quite confused and keeps insisting that there's a control structure that definitely doesn't exist in the TiānshūBench language with seed=1.

JSR_FDED•1d ago
Would it be useful to generate Procedural, OOP and Functional variations of the problems?
chromaton•16h ago
Yes, it would be fantastic to have more languages to test off of. I picked the base language I did (Mamba) because it was easy to modify and integrate into Python.

Outrage in Kenya over detention of software developer

https://www.bbc.com/news/articles/cgmjlp1gnp8o
1•colinprince•1m ago•0 comments

Malte Skarupke Open-Sourcing a Custom Benchmark GUI

https://probablydance.com/2025/05/31/im-open-sourcing-my-custom-benchmark-gui/
1•ibobev•2m ago•0 comments

Raindrops in the Sun's Corona

https://news.njit.edu/raindrops-suns-corona-new-adaptive-optics-shows-stunning-details-our-stars-atmosphere
1•akshayB•2m ago•0 comments

Read-Copy-Update (RCU)

https://www.modernescpp.com/index.php/read-copy-update-rcu/
1•ibobev•3m ago•0 comments

Ask HN: Feedback on an On-the-Fly Zip Service

1•kalaomer•4m ago•0 comments

How to make an app like Gojek [video]

https://www.youtube.com/watch?v=3QrfiqxXBnw
1•heymrcoder•7m ago•0 comments

We Need a New Science of Progress (2019)

https://www.theatlantic.com/science/archive/2019/07/we-need-new-science-progress/594946/
1•bookofjoe•7m ago•1 comments

Don't Shut the Door on International Students

https://song-luo.com/2025/06/01/dont-shut-the-door-on-international-students-my-20-year-journey-from-foreign-student-to-american-citizen/
1•sluosapher•8m ago•0 comments

Caring

https://blog.thinkst.com/2025/06/on-caring.html
1•mh_•10m ago•0 comments

Airbnb's Pivot

https://gadallon.substack.com/p/one-app-to-rule-them-all-airbnb-declares
1•JumpCrisscross•10m ago•0 comments

Show HN: Portfolio.dev – Display your work in dev, design, or product with a URL

https://www.portfolio.dev/
1•alphapv•12m ago•1 comments

U.S. Dependence on China for Rare Earth Magnets Is Causing Shortages

https://www.nytimes.com/2025/06/02/business/china-rare-earths-united-states-supplies.html
3•mitchbob•12m ago•1 comments

Cloudlflare builds OAuth with Claude and publishes all the prompts

https://github.com/cloudflare/workers-oauth-provider/commits/main/
1•gregorywegory•13m ago•1 comments

Board and Card Games That Will Make Your Kids Smarter

https://rishimodha.substack.com/p/the-best-board-and-card-games-that
2•n9com•14m ago•0 comments

Ask HN: Is "compatibilism" causing students to lose interest in philosophy?

1•amichail•16m ago•2 comments

Hudson's Bay Stores to Close in Canada

https://www.nytimes.com/2025/06/01/world/canada/canada-hudsons-bay-stores-closing.html
1•georgecmu•16m ago•0 comments

Plotly Studio

https://plotly.com/blog/introducing-plotly-studio/
5•chriddyp•20m ago•0 comments

Faster route propagation by rewriting our Traefik gateway in Rust

https://rivet.gg/blog/2025-06-02-faster-route-propagation-by-rewriting-our-traefik-gateway-in-rust
3•NathanFlurry•24m ago•1 comments

NIH grant cuts will axe clinical trials abroad – could leave 1000s without care

https://www.nature.com/articles/d41586-025-01721-9
2•rntn•25m ago•0 comments

Betterauth vs. Nextauth

https://www.devtoolsacademy.com/blog/betterauth-vs-nextauth/
1•codeman001•26m ago•0 comments

Sony's Forgotten Computers Helped Shape the Playstation

https://obsoletesony.substack.com/p/how-sonys-forgotten-computers-helped
2•semyonsh•27m ago•0 comments

Tesla executives questioned Musk after he denied killing $25,000 EV

https://www.reuters.com/business/autos-transportation/tesla-executives-questioned-musk-after-he-denied-killing-25000-ev-project-2025-06-02/
5•pinewurst•27m ago•4 comments

What Is Stack-Use-After-Return?

https://gizvault.com/archives/what-is-stack-use-after-return
4•ricecat•29m ago•0 comments

Ask HN: Could you share your personal blog?

2•nelsonfigueroa•35m ago•1 comments

Gemini Fullstack LangGraph Quickstart

https://github.com/google-gemini/gemini-fullstack-langgraph-quickstart
1•philschmidxxx•37m ago•0 comments

Traveling to Mars and Ceres Using Lunar Gateway as a Springboard

https://www.universetoday.com/articles/traveling-to-mars-and-ceres-using-lunar-gateway-as-a-springboard
1•rbanffy•38m ago•0 comments

Graphviz.NetWrapper

https://github.com/Rubjerg/Graphviz.NetWrapper
1•ctenb•38m ago•0 comments

Show HN: A game for Math and Computer Science nerds. Check it out

https://www.squashbyte.com/
2•clocker•38m ago•0 comments

Chiplets and the Future of System Design – By Austin Lyons

https://www.chipstrat.com/p/chiplets-and-the-future-of-system
1•rbanffy•39m ago•0 comments

Productivity Hacks Every Engineer and Manager Should Know

https://newsletter.eng-leadership.com/p/15-productivity-hacks-every-engineer
2•rbanffy•40m ago•0 comments