frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The revenge of the data scientist

https://hamel.dev/blog/posts/revenge/
56•hamelsmu•4d ago

Comments

jamesblonde•1h ago
I say this quite a lot to data scientists who are now building agents:

1. think of the context data as training data for your requests (the LLM performs in-context learning based on your provided context data)

2. think of evals as test data to evaluate the performance of your agents. Collect them from agent traces and label them manually. If you want to "train" a LLM to act as a judge to label traces, then again, you will need lots of good quality examples (training data) as the LLM-as-a-Judge does in-context learning as well.

From my book - https://www.amazon.com/Building-Machine-Learning-Systems-Fea...
pbronez•1h ago
Yup, agree. “Evaluations” = Tests

Gets pretty meta when you’re evaluating a model which needs to evaluate the output of another agent… gotta pin things down to ground truth somewhere.

maxwg•1h ago
I can see cases like the recently mentioned pg_textsearch (https://news.ycombinator.com/item?id=47589856) being perfect cases for this kind of development style succeeding - where you have the clear test cases, benchmarks, etc you can meet.

Though for greenfield development, writing the test cases (like the spec) is equally as hard, if not harder than writing the code.

I also observe that LLMs tend to find themselves trapped in local minima. Once the codebase architecture has been solidified, very rarely will it consider larger refactors. In some ways - very similar to overfitting in ML

djoldman•1h ago
> The bulk of the work is setting up experiments to test how well the AI generalizes to unseen data, debugging stochastic systems, and designing good metrics.

In my experience, this is missing a big part of the work: confirming what the data actually is, sometimes despite what people think it is.

uduni•1h ago
So true... I get more mileage from just watching an agent work than building sophisticated LLM-as-judge workflows
convexly•27m ago
I mean it is a similar loop. Define what good looks like, measure how far off you are, iterate. I would say though that the people who've been doing that for years just have a head start that prompt engineers don't.
Flashtoo•25m ago
These are good practices to keep in mind when setting up GenAI solutions, but I'm not convinced that this part of the job will allow "data scientist" as a profession to thrive. Here's my pessimistic take.

Data scientists were appreciated largely because of their ability to create models that unlock business value. Model creation was a dark magic that you needed strong mathematical skills to perform - or at least that's the image, even if in reality you just slap XGBoost on a problem and call it a day. Data scientists were enablers and value creators.

With GenAI, value creation is apparently done by the LLM provider and whoever in your company calls the API, which could really be any engineering team. Coaxing the right behavior out of the LLM is a bit of black magic in itself, but it's not something that requires deep mathematical knowledge. Knowing how gradients are calculated in a decoder-only transformer doesn't really help you make the LLM follow instructions. In fact, all your business stakeholders are constantly prompting chatbots themselves, so even if you provide some expertise here they will just see you as someone doing the same thing they do when they summarize an email.

So that leaves the part the OP discusses: evaluation and monitoring. These are not sexy tasks and from the point of view of business stakeholders they are not the primary value add. In fact, they are barriers that get in the way of taking the POC someone slapped together in Copilot (it works!) and putting that solution in production. It's not even strictly necessary if you just want to move fast and break things. Appreciation for this kind of work is most present in large risk-averse companies, but even there it can be tricky to convince management that this is a job that needs to be done by a highly paid statistician with a graduate degree.

What's the way forward? Convince management that people with the job title "data scientist" should be allowed to gatekeep building LLM solutions? Maybe I'm overestimating how good the average AI-aware software engineer is at this stuff, but I don't see the professional moat.

daemonk•11m ago
I have a data science/engineering background. From my perspective, using AI is like mining the solution space for optimality. The solution space is the combinatorics of the billions of parameters and their cardinalities. You try to narrow down the search space with your prompt and hopefully guide your mining with more semantic-based heuristics towards your optimal solution.

You might hit a local maxima or go down a blind path. I tend to completely start my code base from scratch every week. I would make things more generic, remove unnecessary complexity, or add new features. And hope that can move me past the local maxima.

DRAM pricing is killing the hobbyist SBC market

https://www.jeffgeerling.com/blog/2026/dram-pricing-is-killing-the-hobbyist-sbc-market/
90•ingve•1h ago•42 comments

Swappa.com for GrapheneOS compatible devices – Stay Away

https://discuss.grapheneos.org/d/33727-swappacom-for-grapheneos-compatible-devices-stay-away
23•OsrsNeedsf2P•39m ago•12 comments

Signing data structures the wrong way

https://blog.foks.pub/posts/domain-separation-in-idl/
57•malgorithms•2h ago•32 comments

Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

https://github.com/hauntsaninja/git_bayesect
145•hauntsaninja•4d ago•19 comments

The revenge of the data scientist

https://hamel.dev/blog/posts/revenge/
56•hamelsmu•4d ago•9 comments

Ask HN: Who is hiring? (April 2026)

174•whoishiring•7h ago•145 comments

Fast and Gorgeous Erosion Filter

https://blog.runevision.com/2026/03/fast-and-gorgeous-erosion-filter.html
8•runevision•1d ago•2 comments

Show HN: Flight-Viz – 10K flights on a 3D globe in 3.5MB of Rust+WASM

https://flight-viz.com
33•coolwulf•5h ago•18 comments

InspectMind AI (YC W24) Is Hiring

https://www.ycombinator.com/companies/inspectmind-ai/jobs/jQNra64-software-engineer-build-the-wor...
1•aakashprasad91•1h ago

Scientists crack a 20-year nuclear mystery behind the creation of gold

https://www.sciencedaily.com/releases/2026/03/260313002633.htm
29•prabal97•3h ago•7 comments

Jax's true calling: Ray-Marching renderers on WebGL

https://benoit.paris/posts/jax-ray-marcher/
32•BenoitP•3h ago•4 comments

EmDash – a spiritual successor to WordPress that solves plugin security

https://blog.cloudflare.com/emdash-wordpress/
410•elithrar•6h ago•301 comments

StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)

https://app.uniclaw.ai/arena?tab=costEffectiveness&via=hn
122•skysniper•6h ago•55 comments

AI for American-produced cement and concrete

https://engineering.fb.com/2026/03/30/data-center-engineering/ai-for-american-produced-cement-and...
117•latchkey•5h ago•100 comments

NASA Artemis II moon mission live launch broadcast

https://plus.nasa.gov/scheduled-video/nasas-artemis-ii-crew-launches-to-the-moon-official-broadcast/
355•apitman•5h ago•250 comments

SpaceX files to go public

https://www.nytimes.com/2026/04/01/technology/spacex-ipo-elon-musk.html
109•nutjob2•4h ago•137 comments

CERN levels up with new superconducting karts

https://home.cern/news/news/engineering/cern-levels-new-superconducting-karts
376•fnands•15h ago•85 comments

Windows 95 defenses against installers that overwrite a file with an older one

https://devblogs.microsoft.com/oldnewthing/20260324-00/?p=112159
94•michelangelo•3d ago•43 comments

How-to guide: Commissioning a Sensor Physics R&D Lab

https://gist.github.com/nup002/912383615b12dc1ec44ae9004c40b11f
13•MagneLauritzen•2d ago•2 comments

An Introduction to Writing Systems and Unicode

https://r12a.github.io/scripts/tutorial/part2
45•mariuz•3d ago•10 comments

Show HN: Zerobox – Sandbox any command with file, network, credential controls

https://github.com/afshinm/zerobox
84•afshinmeh•2d ago•75 comments

Claude wrote a full FreeBSD remote kernel RCE with root shell

https://github.com/califio/publications/blob/main/MADBugs/CVE-2026-4747/write-up.md
233•ishqdehlvi•17h ago•100 comments

Ask HN: Who wants to be hired? (April 2026)

46•whoishiring•7h ago•108 comments

The OpenAI graveyard: All the deals and products that haven't happened

https://www.forbes.com/sites/phoebeliu/2026/03/31/openai-graveyard-deals-and-products-havent-happ...
202•dherls•6h ago•154 comments

Random numbers, Persian code: A mysterious signal transfixes radio sleuths

https://www.rferl.org/a/mystery-numbers-station-persian-signal-iran-war/33700659.html
97•thinkingemote•10h ago•101 comments

Is BGP safe yet?

https://isbgpsafeyet.com/
229•janandonly•9h ago•78 comments

Show HN: Dull – Instagram Without Reels, YouTube Without Shorts (iOS)

https://getdull.app
6•kasparnoor•1h ago•2 comments

Intuiting Pratt Parsing

https://louis.co.nz/2026/03/26/pratt-parsing.html
137•signa11•2d ago•43 comments

Ada and Spark on ARM Cortex-M – A Tutorial with Arduino and Nucleo Examples

http://inspirel.com/articles/Ada_On_Cortex.html
52•swq115•4d ago•19 comments

The AI Marketing BS Index

https://bastian.rieck.me/blog/2026/bs/
80•speckx•4h ago•15 comments