frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Which AI model is best for real data analysis?

https://mljar.com/analysis/
2•pplonski86•3h ago

Comments

pplonski86•3h ago
We built a benchmark to evaluate LLMs on real data analysis workflows. Instead of single prompts, each task is a sequence of prompts (steps). It is similar to how a human data analyst works in practice. Each run is saved as full python notebook, including prompts, code and outputs. We evaluated runs across task completion, code correctess, output quality, reasoning and reliability. Each workflow is execuuted multiple times and scored automatically.

Modern LLMs perform very well on individual steps. The benchmark currently inludes 23 workflows from different data analysis tasks (EDA, ML, NLP, statistics ...). The top-3 models across the 23 workflows, gpt-oss:120b scored 9.87/10, followed by gpt-5.4 at 9.65/10, glm-5.1 at 9.48/10. Which is very high in my opinion. The results show that modern LLMs perform very well on data analysis tasks. All feedback is welcome! I uploaded all notebooks for each model https://github.com/pplonski/ai-for-data-analysis

I built an AI to do my job end-to-end. The problem wasn't the AI

https://medium.com/@iroy2000/i-tried-to-automate-my-own-job-heres-what-i-found-15fb86d415c2
1•iroy2000•1m ago•1 comments

Using Actor Network Theory to rethink work in the age of generative AI

https://stripepartners.substack.com/p/this-months-frame-using-actor-network
1•laurex•2m ago•0 comments

Show HN: Messaging without phone numbers, email, or metadata

https://tunnelmessenger.com/access
1•trpfnc•2m ago•0 comments

Show HN: LoadLens – See why queues hide overload instead of solving it

https://loadlens.dev
1•janbalangue•3m ago•0 comments

Show HN: AriaType – open-source privacy-first and local-first voice-to-text app

https://github.com/joe223/AriaType
1•Joe_Harris•4m ago•0 comments

Bot Bait – Just hit $2K MRR after 8 months of grinding

https://old.reddit.com/r/SaaS/comments/1sl3mrh/just_hit_2k_mrr_after_8_months_of_grinding
1•csomar•4m ago•0 comments

Show HN: Cliparr – Export clips from your personal media server

https://github.com/TechSquidTV/Cliparr
1•TechSquidTV•5m ago•0 comments

Same LLM, different agent: a CI debugger built on Claude

https://www.mendral.com/blog/same-llm-different-agent
1•shad42•7m ago•0 comments

The Meta Product Manager

https://k2xl.substack.com/p/the-meta-product-manager
2•k2xl•7m ago•0 comments

Building a Browser for the Agent Era

https://www.tinyfish.ai/blog/building-a-browser-for-the-agent-era
3•tiny-automates•8m ago•2 comments

Missing Emails in Gmail? It's Your Tabs – and It Costs You More Than You Think

https://clearmailapp.com/blog/gmail-missing-emails-hidden-cost/
1•raghukumar•9m ago•0 comments

Ozempic Dreams

https://daverupert.com/2026/04/ozempic-dreams/
2•speckx•9m ago•0 comments

Hyperbridge exploited two weeks after April Fools' hack joke

https://www.web3isgoinggreat.com/single/hyperbridge-exploit
3•LorenDB•10m ago•0 comments

Is This Agent Safe? Free security checker with scores no platform can revoke

https://agentgraph.co/check
1•kenneives•10m ago•0 comments

Zig 0.16.0 Release Notes

https://ziglang.org/download/0.16.0/release-notes.html
3•ska80•10m ago•0 comments

Amazon Bio Discovery

https://aws.amazon.com/blogs/industries/introducing-amazon-bio-discovery/
1•firasd•10m ago•0 comments

Google, Microsoft, Meta All Tracking You Even When You Opt Out

https://www.404media.co/google-microsoft-meta-all-tracking-you-even-when-you-opt-out-according-to...
8•Cider9986•11m ago•0 comments

Airbnb Hosts Dont Want to Talk to Guests Anymore, Are Outsourcing Messages to AI

https://www.404media.co/airbnb-hosts-dont-want-to-talk-to-guests-anymore-are-outsourcing-messages...
1•Cider9986•11m ago•0 comments

New toothpaste stops gum disease without killing good bacteria

https://www.sciencedaily.com/releases/2026/04/260413043141.htm
1•atombender•11m ago•0 comments

Prolog Implementation of the IRS Fact Graph

https://github.com/alexpetros/factgraph.pl
1•triska•11m ago•0 comments

IMF warns global economy at risk of recession if Iran war persists

https://www.bbc.com/news/articles/c4g66p2q075o
1•Cider9986•12m ago•0 comments

For the First Time in the U.S., Renewables Generate More Power Than Natural Gas

https://e360.yale.edu/digest/us-renewables-natural-gas-coal
10•Brajeshwar•13m ago•0 comments

1-Bit Bonsai: The First Commercially Viable 1-Bit LLMs

https://prismml.com/news/bonsai-8b
1•wicket•14m ago•2 comments

The Fediverse deserves a dumb graphical client

https://adele.pages.casa/md/blog/the-fediverse-deserves-a-dumb-graphical-client.md
3•speckx•15m ago•0 comments

Show HN: A memory database that forgets, consolidates, and detects contradiction

https://github.com/yantrikos/yantrikdb-server
1•pranabsarkar•16m ago•1 comments

Ask HN: What funding models exist for a search engine?

1•PingCo•17m ago•0 comments

Re-Mapping Opcodes on the 6502

https://6502.org/forum/viewtopic.php?t=1949
2•JPLeRouzic•18m ago•0 comments

Prof. Eric Goldman on "sign-in wrap" decision from Judge Orrick

https://blog.ericgoldman.org/archives/2026/04/remember-when-the-ninth-circuit-rejected-classpass-...
1•dctoedt•19m ago•0 comments

Finding My Way Back

https://notes.jeddacp.com/finding-my-way-back/
1•speckx•20m ago•0 comments

DaVinci Resolve 21

https://www.blackmagicdesign.com/products/davinciresolve/whatsnew
1•pentagrama•21m ago•0 comments