frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: CSV GB+ by Data.olllo – Open and Process CSVs Locally

https://apps.microsoft.com/detail/9pfcrwp46v22?hl=en-US&gl=US
40•olllo•8h ago
I built CSV GB+ by Data.olllo, a local data tool that lets you open, clean, and export gigabyte-sized CSVs (even billions of rows) without writing code.

Most spreadsheet apps choke on big files. Coding in pandas or Polars works—but not everyone wants to write scripts just to filter or merge CSVs. CSV GB+ gives you a fast, point-and-click interface built on dual backends (memory-optimized or disk-backed) so you can process huge datasets offline.

Key Features: Handles massive CSVs with ease — merge, split, dedup, filter, batch export

Smart engine switch: disk-based "V Core" or RAM-based "P Core"

All processing is offline – no data upload or telemetry

Supports CSV, XLSX, JSON, DBF, Parquet and more

Designed for data pros, students, and privacy-conscious users

Register for 7-days free to pro try, pro versions remove row limits and unlock full features. I’m a solo dev building Data.olllo as a serious alternative to heavy coding or bloated enterprise tools.

Download for Windows: https://apps.microsoft.com/detail/9PFR86LCQPGS

User Guide: https://olllo.top/articles/article-0-Data.olllo-UserGuide

Would love feedback! I’m actively improving it based on real use cases.

Comments

xnx•7h ago
Is this better than the free Tad (https://www.tadviewer.com/) which seems to do similar things for free?
rad_gruchalski•7h ago
And on operating systems other than Windows...
dangerlibrary•7h ago
It is 2025 and CSVs still dominate data interchange between organizations.

https://graydon2.dreamwidth.org/193447.html

esafak•7h ago
parquet is also popular.
paddy_m•7h ago
Do you have a demo video?

What are you using for processing (polars)?

Marketing note: I'm sure you're proud of P Core/V Core, but that doesn't matter to your users, it's an implementation detail. At a maximum I'd write "intelligent execution that scales from small files to large files".

As an implementation note, I would make it simple to operate on just the first 1000 (10k or 100k) rows so responses are super quick, then once the users are happy about the transform, make it a single click to operate on the entire file with a time estimate.

Another feature I'd like in this vein is execute on a small subset, then if you find an error with a larger subset, try to reduce the larger subset to a small quick to reproduce version. Especially for deduping.

marcellus23•2h ago
> Marketing note: I'm sure you're proud of P Core/V Core, but that doesn't matter to your users, it's an implementation detail. At a maximum I'd write "intelligent execution that scales from small files to large files".

Speaking personally, "intelligent execution that scales from small files to large files" sounds like marketing buzz that could mean absolutely nothing. I like that it mentions specifically switching between RAM and disk-powered engines, because that suggests it's not just marketing speak, but was actually engineered. Maybe P vs V Core is not the best way to market it, but I think it's worth mentioning that design.

TheTaytay•6h ago
Thank you for this. I find myself increasingly using CSVs (TSVs actually) as the data format of choice. I confess I wish this was written for Mac too, but I like the trend of (once again) moving data processing down to our super computers on our desk...
hilti•6h ago
… I‘m trying to use our super computers in our pockets, like an iPhone ;-) But still struggling with the way how to present CSV data effectively on a small screen, although it‘s huge in terms of pixels compared to computer screens from the 90s

It‘s interesting to research how capable applications like Lotus123 have been even on low resolutions like 800x600 pixel compared to today’s standard

RyanHamilton•5h ago
QStudio allows querying CSV on mac via DuckDB: https://www.timestored.com/qstudio/csv-file-viewer I've been improving the Mac version a lot lately, key bindings, icon, an App package to download. So if you find any problems please raise a github issue.
hermitcrab•2h ago
If you are wrangling CSV/TSV files on Mac, it might be worth taking a look at Easy Data Transform.
paddy_m•2h ago
Ok, if we are all tagging and promoting our own projects, check out mine.

I created Buckaroo to provide a better table viewing experience inside of notebooks. I also built a low code UI and auto cleaning to expedite the wrote data cleaning tasks that take up a large portion of data analysis. Autocleaning is heuristically powered - no LLMs, so it's fast and your data stays local. You can apply different autocleaning strategies and visually inspect the results. When you are happy with the cleaning, you can copy and paste the python code as a reusable function.

All of this is open source, and its extendable/customizable.

Here's a video walking through autocleaning and how to extend it https://youtu.be/A-GKVsqTLMI

Here's the repo: https://github.com/paddymul/buckaroo

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
597•Fysi•8h ago•164 comments

Show HN: Muscle-Mem, a behavior cache for AI agents

https://github.com/pig-dot-dev/muscle-mem
111•edunteman•3h ago•24 comments

What is HDR, anyway?

https://www.lux.camera/what-is-hdr/
478•_kush•10h ago•238 comments

Migrating to Postgres

https://engineering.usemotion.com/migrating-to-postgres-3c93dff9c65d
32•shenli3514•1h ago•3 comments

Show HN: Semantic Calculator (king-man+woman=?)

https://calc.datova.ai
66•nxa•3h ago•87 comments

A server that wasn't meant to exist

https://it-notes.dragas.net/2025/05/13/the_server_that_wasnt_meant_to_exist/
225•jaypatelani•7h ago•56 comments

Git Bug: Distributed, Offline-First Bug Tracker Embedded in Git, with Bridges

https://github.com/git-bug/git-bug
143•stefankuehnel•1d ago•53 comments

LLMs are making me dumber

https://vvvincent.me/llms-are-making-me-dumber/
45•vincentcheng•36m ago•30 comments

Variadic Switch

https://pydong.org/posts/variadic-switch/
17•Tsche•1d ago•1 comments

Getting Started with Celtic Coins – Crude and Barbarous, or Just Different?

https://collectingancientcoins.co.uk/getting-started-with-celtic-coins-crude-and-barbarous-or-just-different/
15•jstrieb•3d ago•4 comments

StackAI (YC W23) Is Hiring Pydantic and FastAPI Wizard

https://www.ycombinator.com/companies/stackai/jobs/8nYnmlN-backend-engineer
1•baceituno•2h ago

Smalltalk-78 Xerox NoteTaker in-browser emulator

https://smalltalkzoo.thechm.org/users/bert/Smalltalk-78.html
64•todsacerdoti•6h ago•26 comments

Our narrative prison

https://aeon.co/essays/why-does-every-film-and-tv-series-seem-to-have-the-same-plot
104•anarbadalov•7h ago•101 comments

The cryptography behind passkeys

https://blog.trailofbits.com/2025/05/14/the-cryptography-behind-passkeys/
149•tatersolid•12h ago•123 comments

Changes since congestion pricing started in New York

https://www.nytimes.com/interactive/2025/05/11/upshot/congestion-pricing.html
154•Vinnl•1d ago•163 comments

Hegel 2.0: The imaginary history of ternary computing (2018)

https://www.cabinetmagazine.org/issues/65/weatherby.php
16•Hooke•2d ago•1 comments

Databricks and Neon

https://www.databricks.com/blog/databricks-neon
253•davidgomes•13h ago•179 comments

The AUCTUS A6: the chip enabling inexpensive DMR Radio (2021)

https://jhart99.com/auctus-a6/
8•walterbell•3d ago•4 comments

Bus stops here: Shanghai lets riders design their own routes

https://www.sixthtone.com/news/1017072
437•anigbrowl•18h ago•308 comments

UK's Ancient Tree Inventory

https://ati.woodlandtrust.org.uk/
48•thinkingemote•13h ago•50 comments

Perverse incentives of vibe coding

https://fredbenenson.medium.com/the-perverse-incentives-of-vibe-coding-23efbaf75aee
125•laurex•3h ago•117 comments

The recently lost file upload feature in the Nextcloud app for Android

https://nextcloud.com/blog/nextcloud-android-file-upload-issue-google/
361•morsch•17h ago•130 comments

How the economics of multitenancy work

https://www.blacksmith.sh/blog/the-economics-of-operating-a-ci-cloud
131•tsaifu•10h ago•27 comments

Updated rate limits for unauthenticated requests

https://github.blog/changelog/2025-05-08-updated-rate-limits-for-unauthenticated-requests/
43•xena•5d ago•56 comments

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

30•MarcoDewey•7h ago•17 comments

An accessibility update – GTK Development Blog

https://blog.gtk.org/2025/05/12/an-accessibility-update/
55•todsacerdoti•1d ago•11 comments

Interferometer Device Sees Text from a Mile Away

https://physics.aps.org/articles/v18/99
184•bookofjoe•4d ago•49 comments

How to Build a Smartwatch: Picking a Chip

https://ericmigi.com/blog/how-to-build-a-smartwatch-picking-a-chip/
215•rcarmo•16h ago•96 comments

Show HN: Lumier – Run macOS VMs in a Docker

https://github.com/trycua/cua/tree/main/libs/lumier
106•GreenGames•8h ago•34 comments

We Made CUDA Optimization Suck Less

https://www.rightnowai.co/
33•jaberjaber23•1d ago•6 comments