frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Norma – build good datasets (using an objective)

https://norma.grouplabs.ca
1•noelfranthomas•9m ago
My team has worked for F500s, startups, and everything in between.

In every case, we found it almost impossible to assemble an ideal dataset for training models. In real-world systems, the information you actually need is scattered across 30–300+ tables, stored in different warehouses, parquets, CSVs, and legacy DBs that nobody fully understands anymore.

We realized the real job isn’t ETL (too wide), or feature engineering (too narrow) it’s constructing the ideal representation of the problem so downstream models can actually learn something meaningful.

So we built Norma, an optimization-first data platform. It does the things every ML team wishes their stack would do: 1. Unity Catalog integration that works out of the box - connect a warehouse, instantly browse tables with lineage, schemas, and metadata.

2. A unified SQL/Python pipeline engine - both languages execute in the same memory buffer (via DuckDB), so no more glue code or brittle data hops.

3. An AI assistant for transformations - ask for a feature, a join, an explanation, a visualization (generates pipeline steps).

4. Multi-bandit 5-fold cross-validation - fast, automatic evaluation of transformed datasets with xgboost.

5. Visual lineage + shared datasets - every step is inspectable, reproducible, and sharable across teams.

That’s what we have today.

We’re still building:

- Automatic leakage detection (timestamp violations, post-outcome signals, unsafe joins)

- Relevant table discovery (find the tables that actually matter for predicting your target)

- Relevant row selection (especially for PFN-style models with row limits)

- Automated feature representation (scaling, encoding, aggregation, embeddings)

- AutoGluon + TabPFN integration (train strong models on normalized, optimized datasets)

- Differential privacy guardrails for LLM usage inside your data workflows

We’re trying to build the equivalent of a representation compiler: raw warehouse → optimal feature space → any model or BI tool.

If you’ve ever lost days hunting through a schema, debugging leakage, redoing feature pipelines, or trying to understand why a model plateaus even though your data is “fine,” I’d genuinely love your feedback. We’re still working closely with teams to refine our features and capabilities, and we’d love to share a private beta with your team. Please join the waitlist!

Happy to answer anything here.

Trifold is a tool to quickly and cheaply host static websites using a CDN

https://www.jpt.sh/projects/trifold/
1•birdculture•43s ago•0 comments

AI-crafted interactive experiences, generated instantly from any prompt

https://generativeui.net//
1•BruceWok•6m ago•2 comments

LLM assisted book reader by Karpathy

https://github.com/karpathy/reader3
1•pbd•6m ago•0 comments

I launched a directory with well-made products because everything seems buggy

https://select.supply
3•laurentiurad•9m ago•1 comments

Show HN: Norma – build good datasets (using an objective)

https://norma.grouplabs.ca
1•noelfranthomas•9m ago•0 comments

Quake Engine Indicators

https://fabiensanglard.net/quake_indicators/index.html
1•liquid_x•10m ago•0 comments

Show HN: Implementing a core subset of ARM assembly in pure C89

https://github.com/orionfollett/oarm
1•orionfollett•13m ago•0 comments

I am building a collaborative coding agent

1•brainless•14m ago•0 comments

LGTM Culture: A Short Story

https://alt.management/lgtm-culture/
1•HotGarbage•21m ago•0 comments

Cloudflare: Piracy Liability Ruling Has Global Implications; Publishers Disagree

https://torrentfreak.com/cloudflare-says-piracy-liability-ruling-sets-a-dangerous-precedent-the-p...
1•gslin•26m ago•0 comments

The Anatomy of a Dysfunctional Standards Body – Peter Gutmann [pdf]

https://archive.openssl-conference.org/2025/presentations/Peter_Gutmann_ietf.pdf
1•commandersaki•29m ago•0 comments

Solar Superstorm Gannon crushed Earth's plasmasphere to a record low

https://www.sciencedaily.com/releases/2025/11/251122234723.htm
2•ashishgupta2209•36m ago•0 comments

A tiny fantasy console inspired by early 90s handheld consoles

https://github.com/beep8/beep8-sdk
2•beep8_official•36m ago•1 comments

What is the most cramped memory card you own?

https://www.tomshardware.com/pc-components/microsd-cards/the-small-capacity-memory-card-champions...
2•indigoabstract•37m ago•0 comments

The "Good Enough" Lie in Engineering

https://www.andrewvittiglio.com/thoughts/the-good-enough-lie
1•andr3wV•46m ago•1 comments

Earth just got hit by a stealth solar storm no one saw coming

https://www.space.com/stargazing/auroras/earth-just-got-hit-by-a-stealth-solar-storm-no-one-saw-c...
2•Brajeshwar•47m ago•0 comments

Is the AI Bubble About to Burst?

https://singularityhub.com/2025/11/21/is-the-ai-bubble-about-to-burst-what-to-watch-for-as-the-ma...
1•Brajeshwar•48m ago•0 comments

A million ways to die from a data race in Go

https://gaultier.github.io/blog/a_million_ways_to_data_race_in_go.html
2•broken_broken_•48m ago•0 comments

The Latent Role of Open Models in the AI Economy

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5767103
1•signa11•50m ago•0 comments

No free lunch in vibe coding

https://bytesauna.com/post/prompting
2•mapehe•1h ago•1 comments

IDescriptor: A Cross-Platform iOS Device Management Tool

https://github.com/iDescriptor/iDescriptor
2•0x54MUR41•1h ago•0 comments

Show HN: Qdrant Vector Aggregator

https://github.com/vinerya/qdrant_vector_aggregator
3•chelbi•1h ago•1 comments

Hackers Bypass Signal, Telegram and WhatsApp Encryption to Read Messages

https://www.forbes.com/sites/daveywinder/2025/11/23/hackers-bypass-signal-telegram-and-whatsapp-e...
3•mionhe•1h ago•0 comments

Build a Compiler in Five Projects

https://kmicinski.com/functional-programming/2025/11/23/build-a-language/
4•azhenley•1h ago•0 comments

Show HN: Syd – An offline-first, AI-augmented workstation for blue teams

https://www.sydsec.co.uk
6•paul2495•1h ago•2 comments

A One-Minute ADHD Test

https://psychotechnology.substack.com/p/a-one-minute-adhd-test-2330
2•eatitraw•1h ago•0 comments

Technology Radar: An opinionated guide to today's technology landscape

https://www.thoughtworks.com/en-in/radar
2•pramodbiligiri•1h ago•0 comments

AI Document Processing with Docling Java, Arconia, and Spring Boot

https://www.thomasvitale.com/ai-document-processing-docling-java-arconia-spring-boot/
1•thomasvitale•1h ago•0 comments

User reports indicate possible problems at Cloudflare

https://downdetector.in/status/cloudflare/
1•nine_minutes•1h ago•0 comments

Show HN: Simulating the vacuum as a superfluid to derive Alpha = 1/137

https://github.com/moseszhu999/geometric-vacuum-sim
3•moseszhu•1h ago•1 comments