frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Norma – build good datasets (using an objective)

https://norma.grouplabs.ca
3•noelfranthomas•2mo ago
My team has worked for F500s, startups, and everything in between.

In every case, we found it almost impossible to assemble an ideal dataset for training models. In real-world systems, the information you actually need is scattered across 30–300+ tables, stored in different warehouses, parquets, CSVs, and legacy DBs that nobody fully understands anymore.

We realized the real job isn’t ETL (too wide), or feature engineering (too narrow) it’s constructing the ideal representation of the problem so downstream models can actually learn something meaningful.

So we built Norma, an optimization-first data platform. It does the things every ML team wishes their stack would do: 1. Unity Catalog integration that works out of the box - connect a warehouse, instantly browse tables with lineage, schemas, and metadata.

2. A unified SQL/Python pipeline engine - both languages execute in the same memory buffer (via DuckDB), so no more glue code or brittle data hops.

3. An AI assistant for transformations - ask for a feature, a join, an explanation, a visualization (generates pipeline steps).

4. Multi-bandit 5-fold cross-validation - fast, automatic evaluation of transformed datasets with xgboost.

5. Visual lineage + shared datasets - every step is inspectable, reproducible, and sharable across teams.

That’s what we have today.

We’re still building:

- Automatic leakage detection (timestamp violations, post-outcome signals, unsafe joins)

- Relevant table discovery (find the tables that actually matter for predicting your target)

- Relevant row selection (especially for PFN-style models with row limits)

- Automated feature representation (scaling, encoding, aggregation, embeddings)

- AutoGluon + TabPFN integration (train strong models on normalized, optimized datasets)

- Differential privacy guardrails for LLM usage inside your data workflows

We’re trying to build the equivalent of a representation compiler: raw warehouse → optimal feature space → any model or BI tool.

If you’ve ever lost days hunting through a schema, debugging leakage, redoing feature pipelines, or trying to understand why a model plateaus even though your data is “fine,” I’d genuinely love your feedback. We’re still working closely with teams to refine our features and capabilities, and we’d love to share a private beta with your team. Please join the waitlist!

Happy to answer anything here.

The next frontier in weight-loss drugs: one-time gene therapy

https://www.washingtonpost.com/health/2026/01/24/fractyl-glp1-gene-therapy/
1•bookofjoe•1m ago•1 comments

At Age 25, Wikipedia Refuses to Evolve

https://spectrum.ieee.org/wikipedia-at-25
1•asdefghyk•3m ago•2 comments

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

https://reviewreact.com
1•sara_builds•4m ago•0 comments

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

https://zenodo.org/records/18514533
1•DarenWatson•5m ago•0 comments

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

1•laurex•8m ago•0 comments

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

https://github.com/amtiYo/agents
1•amtiyo•9m ago•0 comments

Hello

1•otrebladih•11m ago•0 comments

FSD helped save my father's life during a heart attack

https://twitter.com/JJackBrandt/status/2019852423980875794
2•blacktulip•13m ago•0 comments

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

https://writtte.xyz
1•lasgawe•15m ago•0 comments

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

https://www.youtube.com/watch?v=e9FUdOfp8ME
1•zeristor•17m ago•0 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
2•gnufx•19m ago•0 comments

Transcribe your aunts post cards with Gemini 3 Pro

https://leserli.ch/ocr/
1•nielstron•23m ago•0 comments

.72% Variance Lance

1•mav5431•24m ago•0 comments

ReKindle – web-based operating system designed specifically for E-ink devices

https://rekindle.ink
1•JSLegendDev•26m ago•0 comments

Encrypt It

https://encryptitalready.org/
1•u1hcw9nx•26m ago•1 comments

NextMatch – 5-minute video speed dating to reduce ghosting

https://nextmatchdating.netlify.app/
1•Halinani8•27m ago•1 comments

Personalizing esketamine treatment in TRD and TRBD

https://www.frontiersin.org/articles/10.3389/fpsyt.2025.1736114
1•PaulHoule•28m ago•0 comments

SpaceKit.xyz – a browser‑native VM for decentralized compute

https://spacekit.xyz
1•astorrivera•29m ago•0 comments

NotebookLM: The AI that only learns from you

https://byandrev.dev/en/blog/what-is-notebooklm
2•byandrev•29m ago•1 comments

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

https://github.com/ClickHouse/postgres-clickhouse-stack
1•saisrirampur•30m ago•0 comments

Game Boy Advance d-pad capacitor measurements

https://gekkio.fi/blog/2026/game-boy-advance-d-pad-capacitor-measurements/
1•todsacerdoti•30m ago•0 comments

South Korean crypto firm accidentally sends $44B in bitcoins to users

https://www.reuters.com/world/asia-pacific/crypto-firm-accidentally-sends-44-billion-bitcoins-use...
2•layer8•31m ago•0 comments

Apache Poison Fountain

https://gist.github.com/jwakely/a511a5cab5eb36d088ecd1659fcee1d5
1•atomic128•33m ago•2 comments

Web.whatsapp.com appears to be having issues syncing and sending messages

http://web.whatsapp.com
1•sabujp•33m ago•2 comments

Google in Your Terminal

https://gogcli.sh/
1•johlo•34m ago•0 comments

Shannon: Claude Code for Pen Testing: #1 on Github today

https://github.com/KeygraphHQ/shannon
1•hendler•35m ago•0 comments

Anthropic: Latest Claude model finds more than 500 vulnerabilities

https://www.scworld.com/news/anthropic-latest-claude-model-finds-more-than-500-vulnerabilities
2•Bender•39m ago•0 comments

Brooklyn cemetery plans human composting option, stirring interest and debate

https://www.cbsnews.com/newyork/news/brooklyn-green-wood-cemetery-human-composting/
1•geox•39m ago•0 comments

Why the 'Strivers' Are Right

https://greyenlightenment.com/2026/02/03/the-strivers-were-right-all-along/
1•paulpauper•41m ago•0 comments

Brain Dumps as a Literary Form

https://davegriffith.substack.com/p/brain-dumps-as-a-literary-form
1•gmays•41m ago•0 comments