frontpage.

Show HN: I built a visual, MLOps tool (Skyulf)

https://www.skyulf.com/

2•flyingriverhrse•1mo ago

Hi HN,

I built Skyulf because I kept encountering two specific problems that existing tools (like MLflow or standard Scikit-learn pipes) didn't quite solve for me: silent data leakage and monolithic pickles.

## The Problems

1. Data Leakage is Silent: You compute mean imputation on the full dataset, then split. Your model looks great in dev but fails in production. It happens to the best of us.

2. Deployment Hell (The Pickle Problem): Standard pipelines pickle everything data schema, logic, and 3rd party library versions into one opaque blob. To run a simple inference, you need the same heavy environment used for training.

## The Solution: Distinct Calculator & Applier

Skyulf enforces a strict separation of concerns using a Calculator / Applier pattern (inspired by modern engine design).

1. Calculator (Fit): Consumes data (`X`, `y`), learns the state (means, vocabularies, coefficients), and outputs a lightweight, JSON-serializable Artifact.

2. Applier (Predict): A pure function. Consumes the Artifac + New Data -> Output.

Why this matters: You can train on a massive GPU cluster, save just the lightweight JSON artifacts (state), and run the Applier on a tiny CPU instance. The Applier is stateless.

3. Structural Leakage Prevention: We use a `SplitDataset` abstraction. Transformers receive train/test/val as a single object but are mathematically forced to compute statistics on `.train` only.

```python from skyulf import SkyulfPipeline

config = { "preprocessing": [ # Split happens FIRST. Leakage is structurally impossible. {"name": "split", "transformer": "TrainTestSplitter", "params": {"test_size": 0.2}}, {"name": "impute_age", "transformer": "SimpleImputer", "params": {"columns": ["age"], "strategy": "mean"}}, {"name": "scale_income", "transformer": "StandardScaler", "params": {"columns": ["income"]}}, ], "modeling": {"type": "random_forest_classifier", "params": {"n_estimators": 100}} }

pipeline = SkyulfPipeline(config) pipeline.fit(df, target_column="target") pipeline.save("model.pkl") ```

## Features

1. Polars-First (~3.5x Faster): We migrated the core engine from Pandas to Polars. Lazy evaluation means we can scan generic CSV/Parquet files instantly for EDA.

2. One-Liner EDA: Generates a comprehensive profile (quality, outliers, VIF, causal graphs) in seconds.

```python from skyulf.profiling.analyzer import EDAAnalyzer from skyulf.profiling.visualizer import EDAVisualizer import polars as pl

df = pl.read_csv("data.csv") profile = EDAAnalyzer(df).analyze(target_col="churn")

viz = EDAVisualizer(profile, df) viz.summary() # Terminal dashboard viz.plot() # Matplotlib distributions & correlations ```

3. Visual ML Canvas (Local-First): A React-based drag-and-drop UI (running locally via FastAPI) that lets you visually debug pipelines. You can click any node to see data stats at that exact point in the pipeline.

## Why Another Tool?

- vs MLflow: We focus on the construction and execution of the pipeline, not just tracking the metrics.

- vs Scikit-learn Pipelines: We separate state (Artifacts) from logic (Appliers) and enforce leakage checks.

- vs Cloud Platforms: Skyulf is self-hosted. Your data never leaves your machine.

## Current Status

The library skyulf-core is stable on PyPI. The visual platform is functional but still being polished. I'm a solo dev building this in public.

I'm building this in public and would love your feedback. If you find this interesting, a star on GitHub would mean a lot! I'm also looking for contributors if you're into Python, React, or MLOps, check out the issues.

---

*Links*: - Repo: https://github.com/flyingriverhorse/Skyulf - PyPI: https://pypi.org/project/skyulf-core - Docs: https://www.skyulf.com

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency

The age of a treacherous, falling dollar

Ask HN: AI Generated Diagrams

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Show HN: A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy