Show HN: CSV GB+ by Data.olllo – Open and Process CSVs Locally

https://apps.microsoft.com/detail/9pfcrwp46v22?hl=en-US&gl=US

51•olllo•8mo ago

I built CSV GB+ by Data.olllo, a local data tool that lets you open, clean, and export gigabyte-sized CSVs (even billions of rows) without writing code.

Most spreadsheet apps choke on big files. Coding in pandas or Polars works—but not everyone wants to write scripts just to filter or merge CSVs. CSV GB+ gives you a fast, point-and-click interface built on dual backends (memory-optimized or disk-backed) so you can process huge datasets offline.

Key Features: Handles massive CSVs with ease — merge, split, dedup, filter, batch export

Smart engine switch: disk-based "V Core" or RAM-based "P Core"

All processing is offline – no data upload or telemetry

Supports CSV, XLSX, JSON, DBF, Parquet and more

Designed for data pros, students, and privacy-conscious users

Register for 7-days free to pro try, pro versions remove row limits and unlock full features. I’m a solo dev building Data.olllo as a serious alternative to heavy coding or bloated enterprise tools.

Download for Windows: https://apps.microsoft.com/detail/9PFR86LCQPGS

User Guide: https://olllo.top/articles/article-0-Data.olllo-UserGuide

Would love feedback! I’m actively improving it based on real use cases.

Comments

xnx•8mo ago

Is this better than the free Tad (https://www.tadviewer.com/) which seems to do similar things for free?

rad_gruchalski•8mo ago

And on operating systems other than Windows...

olllo•8mo ago

Tad is a great tool—very clean and useful for quick exploration.

Data.olllo is focused more on local data processing, not just viewing—things like filtering, transforming, merging, and even running Python code (with AI assistance coming). It’s built for both small and large files with performance in mind, using many cores including Polars under the hood.

Also, good news: the macOS version is in the works and will be submitted to the Mac App Store soon!

dangerlibrary•8mo ago

It is 2025 and CSVs still dominate data interchange between organizations.

https://graydon2.dreamwidth.org/193447.html

esafak•8mo ago

parquet is also popular.

olllo•8mo ago

Absolutely—CSVs are still everywhere, especially for simple interchange between teams and tools. I designed Data.olllo with that in mind.

That said, I also plan to add support for Parquet and other formats soon—definitely agree it's gaining traction for larger, structured datasets.

paddy_m•8mo ago

Do you have a demo video?

What are you using for processing (polars)?

Marketing note: I'm sure you're proud of P Core/V Core, but that doesn't matter to your users, it's an implementation detail. At a maximum I'd write "intelligent execution that scales from small files to large files".

As an implementation note, I would make it simple to operate on just the first 1000 (10k or 100k) rows so responses are super quick, then once the users are happy about the transform, make it a single click to operate on the entire file with a time estimate.

Another feature I'd like in this vein is execute on a small subset, then if you find an error with a larger subset, try to reduce the larger subset to a small quick to reproduce version. Especially for deduping.

marcellus23•8mo ago

> Marketing note: I'm sure you're proud of P Core/V Core, but that doesn't matter to your users, it's an implementation detail. At a maximum I'd write "intelligent execution that scales from small files to large files".

Speaking personally, "intelligent execution that scales from small files to large files" sounds like marketing buzz that could mean absolutely nothing. I like that it mentions specifically switching between RAM and disk-powered engines, because that suggests it's not just marketing speak, but was actually engineered. Maybe P vs V Core is not the best way to market it, but I think it's worth mentioning that design.

olllo•8mo ago

Thanks for the thoughtful take—really appreciate both perspectives.

You're right that terms like "intelligent execution" can feel vague without concrete backing. My goal with mentioning P Core/V Core was to hint at the underlying design—switching between in-memory and disk-based engines like Polars and Vaex—without overwhelming with technical detail.

I’ll look for a better way to explain the idea clearly and briefly. Thanks again!

gopher_space•8mo ago

I wish every product had an engineer-only landing page I could set as a default in my browser. The number of companies that assume I'm familiar with their offering is astounding, and I'm usually looking for implementation docs just to figure out what it actually does.

I'm not saying we need a morlock/eloi toggle.

olllo•8mo ago

Thanks for the thoughtful feedback!

Yes, Data.olllo uses including Polars under the hood for fast and efficient processing. A demo video is in the works and should be up soon.

Good point about the "P Core/V Core" naming—I'll simplify that to focus more on the user benefit, like scaling from small to large files smoothly.

I also like your idea of running transformations on a sample first with a one-click full run—very aligned with the vision. And subset reproduction for errors is a great suggestion, especially for things like deduping. Appreciate it!

paddy_m•8mo ago

Feel free to get in touch. We are building similar tools

TheTaytay•8mo ago

Thank you for this. I find myself increasingly using CSVs (TSVs actually) as the data format of choice. I confess I wish this was written for Mac too, but I like the trend of (once again) moving data processing down to our super computers on our desk...

hilti•8mo ago

… I‘m trying to use our super computers in our pockets, like an iPhone ;-) But still struggling with the way how to present CSV data effectively on a small screen, although it‘s huge in terms of pixels compared to computer screens from the 90s

It‘s interesting to research how capable applications like Lotus123 have been even on low resolutions like 800x600 pixel compared to today’s standard

RyanHamilton•8mo ago

QStudio allows querying CSV on mac via DuckDB: https://www.timestored.com/qstudio/csv-file-viewer I've been improving the Mac version a lot lately, key bindings, icon, an App package to download. So if you find any problems please raise a github issue.

hermitcrab•8mo ago

If you are wrangling CSV/TSV files on Mac, it might be worth taking a look at Easy Data Transform.

paddy_m•8mo ago

Ok, if we are all tagging and promoting our own projects, check out mine.

I created Buckaroo to provide a better table viewing experience inside of notebooks. I also built a low code UI and auto cleaning to expedite the wrote data cleaning tasks that take up a large portion of data analysis. Autocleaning is heuristically powered - no LLMs, so it's fast and your data stays local. You can apply different autocleaning strategies and visually inspect the results. When you are happy with the cleaning, you can copy and paste the python code as a reusable function.

All of this is open source, and its extendable/customizable.

Here's a video walking through autocleaning and how to extend it https://youtu.be/A-GKVsqTLMI

Here's the repo: https://github.com/paddymul/buckaroo

olllo•8mo ago

Thank you! I completely agree—TSVs/CSVs are such a simple yet powerful format, and it's great to hear you're making good use of them. I'm also a big fan of doing as much as possible locally—our machines are incredibly capable these days. Good news: I'm currently working on the macOS version of Data.olllo and plan to submit it to the Mac App Store soon. Stay tuned!

crashabr•8mo ago

How does it compare to OpenRefine https://github.com/OpenRefine

bitbasher•8mo ago

What are "massive" CSVs? I have CSVs in the terabytes that need to be deduped by a specific column. Can it handle that? What if I want to run a function on the column to normalize it before the deduping?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Compile-Time Vibe Coding

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: ARM64 Android Dev Kit

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: Slack CLI for Agents

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: I built a directory of $1M+ in free credits for startups

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

Show HN: Chiptune Tracker

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Compile-Time Vibe Coding

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: ARM64 Android Dev Kit

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: Slack CLI for Agents

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: I built a directory of $1M+ in free credits for startups

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

Show HN: Chiptune Tracker

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

Show HN: CSV GB+ by Data.olllo – Open and Process CSVs Locally

Comments