frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: We made a hiring challenge because Claude can 1-shot our interviews

https://www.atomsnotelectrons.com
4•jgru•4w ago
We're Tutor Intelligence, a robotics company building generally capable robot workers for American industry. We've been thinking about what technical evaluation should look like in a world where AI agents can 1-shot our hardest hour-long coding interviews, and this is one of our first experiments.

The challenge: command 5 robots in a 60x40 warehouse to fulfill 1000 orders. Your score is the number of timesteps to complete everything. Robots can move, pick items from pallets, dock to pallets (so they move together), and fulfill orders at the edge of the warehouse. Simple rules, but the optimization problem has surprising depth (it's NP-hard in about 10 different ways) with lots of room for strategy and creativity.

This is actually a simplified version of a real problem we work on. Warehouse coordination is one of those domains where the gap between a naive solution and a good one is enormous, and there are many valid approaches.

We built a web visualizer so you can see your solution play out, and a leaderboard if you want to submit. AI agent use is encouraged (probably necessary). So far only my cofounder and I have submitted, so we genuinely have no idea how good solutions can get.

Sharing this early because we'd love feedback on the problem design. And yes, we're hiring (that's why we made it): 70 people, Series A, based in Boston, founded out of MIT. But mostly just curious if others find this problem as interesting as we do.

Comments

dnw•4w ago
When I read the title I thought you made a challenge Claude couldn't solve but that's not what you are doing. You are taking a pragmatic approach to the world we live in. I like it.

- It would be good to put what you are planning to learn from this interview process. - Looks like submission is only a text file. Why not ask for chat transcript? - Also, would be useful to let people know what happens after submission/selection.

jgru•4w ago
Submission actually takes you to a full featured visualizer! So you can go back and forth on it and see your robots run. You can then choose to submit to the leaderboard if you'd like (otherwise nothing goes to our servers) which will collect your email. No plans to do anything with those yet besides serve the leaderboard.

Hoping to get a better understanding of whether success on this correlates with things that we think are important for AI-enabled software engineering success. I think this is largely a question of the problem depth, and how much does a solution still need to be driven by that person's creativity, vs the model suggesting the next obvious idea.

theamk•4w ago
Sounds like a take-home challenge, but with a twist that you actually expect people to use AI, so that AI is not "cheating"?

Are you worried about the usual take-home challenge problem of user getting outside help? Either friends solving the problem for them, or paid help doing the same?

jgru•4w ago
Our hope is that beyond being a take-home challenge people can get competitive and it can turn into more of a passion project. I think the problem has sufficient depth that this could be the case! Hence the leaderboard and visualization tooling.

Around cheating, we've never been too worried about this. The real cost is wasted time later down the pipeline, but we can always tell then whether the person lines up with the work in the takehome.

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
1•senekor•37s ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•3m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
2•myk-e•5m ago•2 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•6m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
1•1vuio0pswjnm7•8m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
1•1vuio0pswjnm7•10m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•12m ago•1 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•15m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•19m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•21m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•25m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•37m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•38m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•39m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•52m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•55m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•58m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•1h ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
2•basilikum•1h ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•1h ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•1h ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
4•throwaw12•1h ago•2 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•1h ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•1h ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•1h ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•1h ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•1h ago•1 comments