Most Java file processing solutions either involve a lot of boilerplate or don’t handle concurrency, backpressure, or metrics well out of the box. I needed something fast, clean, and production-friendly — so I built this.
Key features:
Multi-threaded line/batch processing using a configurable thread pool
Producer/consumer model with built-in backpressure
Buffered, asynchronous writing with optional auto-flush
Live metrics: memory usage, throughput, thread times, queue stats
Simple builder API — minimal setup to get going
Output metrics to JSON, CSV, or human-readable format
Use cases:
Large CSV or log file parsing
ETL pre-processing
Line-by-line filtering and transformation
Batch preparation before ingestion
I’d really appreciate your feedback — feature ideas, performance improvements, critiques, or whether this solves a real problem for others. Thanks for checking it out!
gavinray•5h ago
Have the OS handle memory paging and buffering for you and then use Java's parallel algorithms to do concurrent processing.
Create a "MappedByteBuffer" and mmap the file into memory.
If the file is too large, use an "AsynchronousFileChannel" and asynchronously read + process segments of the buffer.
90s_dev•4h ago
papercrane•4h ago
gavinray•4h ago
https://gavinray97.github.io/blog/panama-not-so-foreign-memo...
switchbak•3h ago
Then again, if you're in Java/JVM land you're probably not building bleeding edge DBs ala ScyllaDB. But I'm somewhat surprised at the lack of projects in this space. One would think this would pair well with some of the reactive stream implementations so that you wouldn't have to reimplement things like backpressure, etc.
exabrial•1h ago
threeseed•10m ago
b) SycllaDB is not bleeding edge. It uses the relatively old now DPDK.
c) There are countless reactive stream implementations e.g. https://vertx.io/docs/vertx-reactive-streams/java/
SillyUsername•3h ago
mprataps•1h ago