frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Relatively SoTA LLM Agents from Scratch?

2•solsane•18h ago
As we know, OpenAI is not so open.

In 2023, I was playing with transformers, RNNs and I had an understanding how it worked from top to bottom (e.g. made my own keras, could whiteboard small nets) and I can throw things together in keras or tf pretty quick

I got a job and never touched that again. Data and compute notwithstanding, how hard would it be to make a pet project foundation model using the latest techniques? I’ve heard about MoE, things like that and I figure we’re not just throwing a bunch of layers and dropout in Keras anymore.

Comments

huevosabio•12h ago
The Olmo team is AFAIK the only SOTA-ish model that has fully open source code and data. Their report is fantastic: https://www.datocms-assets.com/64837/1763662397-1763646865-o...

It should give you an idea of how hard it is to do a SOTA model from scratch!

If you relax the SOTA aspect, Karpathy's nanochat has you covered: https://github.com/karpathy/nanochat

walpurginacht•3h ago
I'd suggest you take a read on HuggingFace's writeup when they trained smolLM3

https://huggingface.co/spaces/HuggingFaceTB/smol-training-pl...

rare detailed insight on the entire process

bjourne•3h ago
Read this article: https://dl.acm.org/doi/10.1145/3712285.3759827 Training algorithms are relatively simple (base training, fine-tuning, RL), but the scale is critical. I.e., the engineering infrastructure. The authors recommend a 128 GPU cluster minimum and many petabytes of training data.

How does a "you interview for US company, we do the work" scam work?

17•marttilaine•3h ago•18 comments

Ask HN: What are you buying your kids for Christmas?

29•JamesSwift•13h ago•22 comments

AI coding is sexy, but accounting is the real low-hanging automation target

23•bmadduma•15h ago•12 comments

Revolutionizing Lighting: How Smart LEDs Are Transforming Homes and Businesses

2•emmasuntech•4h ago•0 comments

Ask HN: How can I delete a Substack account in Australia?

4•freefrog334433•5h ago•1 comments

Ask HN: Should "I asked $AI, and it said" replies be forbidden in HN guidelines?

957•embedding-shape•2d ago•460 comments

Practical Tips for Gemini 3

5•xiaoru•1d ago•1 comments

Ask HN: Can someone explain why OpenAI credits expire?

3•jemiluv8•1d ago•8 comments

Ask HN: Relatively SoTA LLM Agents from Scratch?

2•solsane•18h ago•3 comments

Ask HN: What hard problems are still underexplored?

11•brihati•1d ago•18 comments

Why are "remote" jobs in late 2025 still limited to hiring in US/CA/UK/DE?

19•ftonato•2d ago•12 comments

Ask HN: Has anyone been able to renew their IEEE this month?

8•chrisaycock•22h ago•0 comments

Console.text() – SMS alerts when code executes

5•Noel04•1d ago•7 comments

Ask HN: End of Year Book Recommendations

12•marai2•1d ago•5 comments

Is any of you using LLMs to create full features in big enterprise apps?

6•not_that_d•2d ago•6 comments

Ask HN: What are young technically minded people reading?

17•drdec•2d ago•27 comments

Ask HN: How do small businesses handle phone calls?

4•duckkg5•1d ago•4 comments

Ask HN: Is there a "good" (non-privacy horror) aftermarket HUD for your car?

3•xrd•1d ago•2 comments

Ask HN: How can I learn smartphone repair online?

4•rishikeshs•2d ago•3 comments

Ask HN: Are there any viable Android phones for a power user to buy nowadays?

12•gooob•3d ago•8 comments

What's Next? Clippy Copilot?

8•johnnyballgame•2d ago•5 comments

Ask HN: Is it still worth learning a new programming language?

11•xparadigm•2d ago•17 comments

Cursor and Claude Opus 4.5 is a game changer

18•seinecle•4d ago•11 comments

Ask HN: Is it just me or techno-optimism died in the past few years?

37•shubhamjain•5d ago•37 comments

You've reached the end!