Ask HN: How do you handle logging and evaluation when training ML models?

3•calepayson•2mo ago

Hi all, I'm currently in a few ML classes and, while they do a great job covering theory, they don't cover application. At least not past some basic implementations in a Jupyter Notebook.

One friction point I keep running into is how to handle logging and evaluation of the models. Right now I'm using Jupyter Notebook, I'll train the model, then produce a few graphs for different metrics with the test set.

This whole workflow seems to be the standard among the folks in my program but I can't shake the feeling that it seems vibes-based and sub optimal.

I've got a few projects coming up and I want to use them as a chance to improve my approach to training models. What method works for you? Are there any articles or libraries that you would recommend? What do you wish Jr. Engineers new about this?

Thanks!

Comments

calepayson•2mo ago

For now, the plan is to move from Jupyter back to a text editor. Jupyter is very forgiving of mistakes. The model didn't work? Change some parameters and rerun the training cell. This is amazing for new folks, who are being bombarded by new information, and (it sounds like) for experienced folks who have already developed great habits around ML projects. But I think intermediate folks need a little friction to help hammer home why best practice is best practice.

I'm hoping the text editor + project directory approach helps force ML projects away from a single file and towards some sort of codified project structure. Sometimes it just feels like there's too much information in a file and it becomes hard to assign it to a location mentally (a bit like reading a physical copy of a tough book vs a kindle copy). Any advice or thoughts on this would be appreciated!

-1•2mo ago

I’m no ML expert so take what I say with a grain of salt.

Two resources that might be useful are AWS’ SageMaker documentation and the Machine Learning Engineering book by Andriy Burkov. This book doesn’t really go into detail on logging though. One way to evaluate a model is to run a SageMaker processing job that saves the performance metrics in a json file in S3 somewhere. More info on processing jobs: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-j... . AWS has various services for logging which you can look into. This will mostly apply to orgs using AWS, but it might give a sense of how things can be done more generally.

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

Achieving Ultra-Fast AI Chat Widgets

Show HN: Runtime Fence – Kill switch for AI agents

Researchers surprised by the brain benefits of cannabis usage in adults over 40

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind