frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: pqry – A fast, lightweight CLI tool to diagnose Parquet datasets

https://github.com/symblic/pqry
4•setzeno•1w ago
Hi HN,

I’ve spent a lot of time debugging large Parquet datasets on S3 where “something is wrong”, but figuring out what usually means either accessing each file individually or even spinning up Spark just to inspect metadata.

In practice, it’s often things like:

- schema drift across partitions

- columns silently disappearing

- timestamp precision changes

- files written by different pipeline versions

- row groups with bad stats or empty data

By the time you notice, the dataset is already messy and hard to reason about.

So I built pqry, a Rust-based CLI tool that scans Parquet metadata at the dataset/prefix level and surfaces issues like schema drift, unstable columns, partition hotspots, and row-group health.

It works entirely from metadata, so you can point it at tens of thousands of files and get results fast.

Example:

- pqry drift s3://bucket/events/

- pqry columns s3://bucket/events/

- pqry quality s3://bucket/events/

Repo: https://github.com/symblic/pqry

I originally built this for debugging production pipelines where writers and schemas evolved over time and problems only showed up weeks later.

Would love feedback from anyone working with large Parquet datasets in production.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
173•isitcontent•9h ago•21 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
286•vecti•11h ago•129 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
232•eljojo•12h ago•142 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•1 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
59•phreda4•8h ago•11 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
83•antves•1d ago•60 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
45•nwparker•1d ago•11 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•2h ago•0 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
16•NathanFlurry•17h ago•6 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
148•bsgeraci•1d ago•62 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•3h ago•4 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•14h ago•5 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•5h ago•1 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•8h ago•1 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
23•dchu17•13h ago•12 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•6h ago•0 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•6h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
171•vkazanov•1d ago•49 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
11•KevinChasse•14h ago•11 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•8h ago•0 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•15h ago•0 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•8h ago•1 comments

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

https://github.com/SpOpsi/Project-Baver
2•solarV26•12h ago•0 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments

Show HN: Agentism – Agentic Religion for Clawbots

https://www.agentism.church
2•uncanny_guzus•12h ago•0 comments

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

https://github.com/BansheeTech/Disavow-Generator
5•SurceBeats•18h ago•1 comments

Show HN: Total Recall – write-gated memory for Claude Code

https://github.com/davegoldblatt/total-recall
10•davegoldblatt•1d ago•6 comments

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

https://github.com/choihimchan/bpu-stream-engine
2•octablock•14h ago•0 comments