frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

https://github.com/localgpt-app/localgpt
26•yi_wang•1h ago•7 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
62•momciloo•9h ago•13 comments

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
32•mbitsnbites•3d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
298•isitcontent•1d ago•39 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
365•eljojo•1d ago•218 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
44•sandGorgon•2d ago•21 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
374•vecti•1d ago•172 comments

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

https://puemos.github.io/craftplan/
16•deofoo•4d ago•4 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
2•cui•3h ago•1 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
98•antves•2d ago•70 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd
2•davidcondrey•4h ago•2 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
86•phreda4•1d ago•17 comments

Show HN: More beautiful and usable Hacker News

https://twitter.com/shivamhwp/status/2020125417995436090
3•shivamhwp•1h ago•0 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
157•bsgeraci•1d ago•65 comments

Show HN: PalettePoint – AI color palette generator from text or images

https://palettepoint.com
2•latentio•6h ago•0 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
29•dchu17•1d ago•12 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
55•nwparker•2d ago•12 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
2•shubham-coder•8h ago•1 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
23•NathanFlurry•1d ago•11 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
18•denuoweb•2d ago•2 comments

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
3•Keyframe•9h ago•0 comments

Show HN: A toy compiler I built in high school (runs in browser)

https://vire-lang.web.app
3•xeouz•9h ago•1 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
173•vkazanov•2d ago•49 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
2•ivanglpz•11h ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
3•anipaleja•11h ago•0 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•1d ago•8 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
27•JoshPurtell•2d ago•5 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
3•sam256•13h ago•1 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
9•sakanakana00•14h ago•2 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
7•rahuljaguste•1d ago•1 comments
Open in hackernews

Show HN: I built a tool to version control datasets (like Git, but for data)

https://shodata.com
2•aliefe04•3mo ago
Hey everyone,

As a founder, I've been frustrated for years with how my team manages datasets for ML. It always ends up as data_final_v3_fixed.csv in an S3 bucket or a massive Git LFS file that nobody understands.

So, I built Shodata. It’s an open platform (like GitHub) but built specifically for dataset workflows.

The core idea is simple: you upload a file. A new version (v2, v3, etc.) is automatically created when you upload a new file with the same name. You receive a discussion board on every dataset, a complete history, and clean previews and statistics for every version.

To show how it works, I seeded it with a dataset I'm tracking: a log of LLM hallucinations. When I find new ones, I just upload the new file and it versions the dataset.

The platform is an MVP. It has a generous free tier (includes 3 personal private datasets & 10GB storage) and a single Pro plan that unlocks team/organization features (like Org creation and shared private datasets).

I’m looking for feedback from fellow engineers and ML folks on the workflow. Is this useful? What’s missing?

You can check out the platform here: https://shodata.com

And the LLM log dataset: https://shodata.com/shodata/llm-hallucinations

Comments

vmykyt•3mo ago
That is good start

In (big-)data area the idea of data versioning is flying around for decades. As a current consensus for now is to treat information about your files, which is effectively a data, as a metadata.

Said this while trying to create your own solution is always good, maybe you could look at another solutions, like Apache Iceberg (free and open source).

In particular they have concept of Catalog

While from documentation it may look like to adopt Iceberg you need a lot of other moving part, in reality you can start from docker compose [2] and then manage your data using plain old sql syntax.

It may look lake overkill for your specific needs, still good source to steal some ideas.

P.S. there are plenty of such systems in various form-factor

[1] https://iceberg.apache.org/ [2] https://iceberg.apache.org/spark-quickstart/

aliefe04•3mo ago
Thanks for the feedback!

Shodata aims to solve a different problem: lightweight versioning for small-to-medium datasets with zero infrastructure setup. Think "GitHub for CSV files" rather than a full data lakehouse. Iceberg is excellent for production data lakes with Spark/Trino, but it requires running catalogs, configuring S3/Glue, and SQL knowledge. For many ML teams working with <100GB datasets, that's overkill. Our sweet spot is teams who need:

Drag-and-drop versioning (no CLI/SDK required) Instant previews and diff visualization Collaboration features (comments, access control) Public sharing (like the LLM hallucinations dataset)

I'll definitely look at Iceberg's catalog design for inspiration on metadata management. Appreciate the pointer!