frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Contraband

https://www.asymco.com/2025/05/17/contrabrand/
1•carrotsalad•27s ago•0 comments

Gottlob Frege

https://en.wikipedia.org/wiki/Gottlob_Frege
1•vram22•31s ago•1 comments

Salesforce Is Back in Talks to Acquire Informatica

https://www.bloomberg.com/news/articles/2025-05-23/salesforce-is-said-back-in-talks-to-acquire-informatica
1•mfiguiere•54s ago•0 comments

Curiosity: The Hidden Superpower Behind Tech Success

https://tushardadlani.com/curiosity-the-hidden-superpower-behind-tech-success
1•tush726•5m ago•0 comments

A Formal Proof of Complexity Bounds on Diophantine Equations

https://arxiv.org/abs/2505.16963
3•badmonster•5m ago•1 comments

Why 3D-Printing an Untraceable Ghost Gun Is Easier

https://www.wired.com/story/uncanny-valley-3d-printed-untraceable-ghost-guns/
1•Ozarkian•7m ago•0 comments

Freedom and Its Limits: Edward Wilmot Blyden's Black Republicanism

https://www.jhiblog.org/2025/04/30/freedom-and-its-limits-edward-wilmot-blydens-black-republicanism/
2•Traces•7m ago•0 comments

Show HN: Advanced Chunking in JavaScript/TypeScript with Chonkie

2•snyy•11m ago•0 comments

Lidar Can Permanently Damage Your Phone's Camera

https://www.jalopnik.com/1866994/lidar-permanently-damage-phone-camera/
6•rntn•16m ago•2 comments

Speedata Publisher – a professional database Publishing system

https://github.com/speedata/publisher
2•Tomte•16m ago•0 comments

Ask HN: Is anyone working on a ROS/ROS2 successor?

2•paulmist•17m ago•0 comments

GNU Emacs Configuration

https://protesilaos.com/emacs/dotemacs
2•Tomte•17m ago•0 comments

Ask HN: How to start selling my Restaurant AI automation software?

1•Nayak_S1991•17m ago•1 comments

Shine a spotlight on your open source project

https://github.blog/open-source/shine-a-spotlight-on-your-open-source-project/
1•belter•17m ago•0 comments

The Transwedge Product

https://terathon.com/blog/transwedge-product.html
1•ibobev•18m ago•0 comments

The Startup Dictionary of Received Ideas

https://www.thestartupdictionary.org/
1•tnolet•20m ago•0 comments

Using Claude 4 and the File API from a Microcontroller

https://bsky.app/profile/danielmangum.com/post/3lptl3x37q22z
1•hasheddan•23m ago•0 comments

US banana giant Chiquita fires thousands over Panama strike

https://www.aljazeera.com/news/2025/5/23/us-banana-giant-chiquita-fires-thousands-over-panama-strike
3•mooreds•25m ago•0 comments

NixOS 25.05 Released

https://nixos.org/blog/announcements/2025/nixos-2505/
5•todsacerdoti•25m ago•0 comments

A lens on poverty and the environment: Sebastião Salgado is dead at age 81

https://www.aljazeera.com/news/2025/5/23/a-lens-on-poverty-and-the-environment-sebastiao-salgado-is-dead-at-age-81
1•Qem•25m ago•1 comments

Chinese College Gives Harvard International Students 'Unconditional Offers'

https://www.newsweek.com/harvard-hkust-china-college-international-students-offer-2076257
6•doener•27m ago•1 comments

Avoiding becoming the lone dependency peg with load-bearing anime

https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/
12•cratermoon•27m ago•0 comments

Show HN: I built an AI food detection app that serves as Shazam for your meals

https://whatthefood.io
1•OdehAhwal•27m ago•1 comments

A Conversation with Dave Grossman

https://spillhistorie.no/2025/05/23/a-conversation-with-dave-grossman/
2•doener•28m ago•0 comments

Sam and Jony and Skepticism

https://sixcolors.com/post/2025/05/sam-and-jony-and-skepticism/
3•danaris•30m ago•0 comments

Are developers falling out of love with Apple?

https://brucelawson.co.uk/2025/are-developers-falling-out-of-love-with-apple/
2•freediver•30m ago•0 comments

Flowise – Build AI Agents, Visually

https://flowiseai.com/
1•zhengiszen•30m ago•0 comments

Sam Altman is a visionary with a trustworthiness problem

https://www.economist.com/culture/2025/05/20/sam-altman-is-a-visionary-with-a-trustworthiness-problem
2•rurp•31m ago•2 comments

Steal this idea: Re-insuring usage-based SaaS billing so buyers pay a flat fee

1•cjcjameson•32m ago•0 comments

Trump calls for 50% tariff on EU, says he's 'not looking for a deal' with bloc

https://www.cnbc.com/2025/05/23/trump-recommends-50percent-tariff-on-european-union-starting-june-1.html
7•consumer451•32m ago•0 comments
Open in hackernews

Show HN: Samchika – A Java Library for Fast, Multithreaded File Processing

https://github.com/MayankPratap/Samchika
49•mprataps•6h ago
Hi HN, I built a Java library called SmartFileProcessor to make high-performance, multi-threaded file processing simpler and more maintainable.

Most Java file processing solutions either involve a lot of boilerplate or don’t handle concurrency, backpressure, or metrics well out of the box. I needed something fast, clean, and production-friendly — so I built this.

Key features:

Multi-threaded line/batch processing using a configurable thread pool

Producer/consumer model with built-in backpressure

Buffered, asynchronous writing with optional auto-flush

Live metrics: memory usage, throughput, thread times, queue stats

Simple builder API — minimal setup to get going

Output metrics to JSON, CSV, or human-readable format

Use cases:

Large CSV or log file parsing

ETL pre-processing

Line-by-line filtering and transformation

Batch preparation before ingestion

I’d really appreciate your feedback — feature ideas, performance improvements, critiques, or whether this solves a real problem for others. Thanks for checking it out!

Comments

gavinray•5h ago
Please don't do this.

Have the OS handle memory paging and buffering for you and then use Java's parallel algorithms to do concurrent processing.

Create a "MappedByteBuffer" and mmap the file into memory.

If the file is too large, use an "AsynchronousFileChannel" and asynchronously read + process segments of the buffer.

90s_dev•4h ago
Knowing nothing about Java or compsci, I am very curious to see the in depth discussion by all you Java/compsci experts that your comment invites.
papercrane•4h ago
If you're using a newer JVM you can also map a "MemorySegment", which doesn't have the 2GiB limit that byte buffers have.
gavinray•4h ago
Good point, have written about this in the past

https://gavinray97.github.io/blog/panama-not-so-foreign-memo...

switchbak•3h ago
Memory mapping is fun, but shouldn't we have some kind of async IO / uring support by now? If you're looking at really high-perf I/O, mmaping isn't really state of the art right now.

Then again, if you're in Java/JVM land you're probably not building bleeding edge DBs ala ScyllaDB. But I'm somewhat surprised at the lack of projects in this space. One would think this would pair well with some of the reactive stream implementations so that you wouldn't have to reimplement things like backpressure, etc.

exabrial•1h ago
try not to be a dick
threeseed•10m ago
a) There have been libraries supporting io_uring on the JVM for many years now.

b) SycllaDB is not bleeding edge. It uses the relatively old now DPDK.

c) There are countless reactive stream implementations e.g. https://vertx.io/docs/vertx-reactive-streams/java/

SillyUsername•3h ago
Better caveat that with, "but watch memory consumption, given the nature of the likes of CopyOnWriteArraylist". GC will be a bitch.
mprataps•1h ago
Thanks for this comment. This will be an interesting aspect to explore.
codetiger•4h ago
Do you have a benchmark comparison with other similar tools?
sureglymop•4h ago
Perhaps I misunderstand something but doesn't reading from a file require a system call? And when there is a system call, the context switches? So wouldn't using multiple threads to read from a file mean that they can't really read in parallel anyway because they block each other when executing that system call?
mike_hearn•3h ago
System calls aren't context switches. They flip a permission bit in the CPU but don't do the work a context switch involves like modifying the MMU, flushing the TLBs, modifying kernel structures, doing scheduling etc.

Also, modern filing systems are all thread safe. You can have multiple threads reading and even writing in parallel on different CPU cores.

bionsystem•3h ago
If you open() read-only I don't think it blocks (some other process writing to it might block though).
porridgeraisin•1h ago
> system call, the context switches

No, there is no separate kernel "executing". When you do a syscall, your thread becomes kernel mode and it executes the function behind the syscall, then when it's done, your thread reverts to user mode.

A context switch is when one thread is being swapped out for another. Now the syscall could internally spawn a thread and context switch to that, but I'm not sure if this happens in read() or any syscall for that matter.

sidcool•4h ago
It would be even more amazing if it had tests. It's already pretty good.
DannyB2•4h ago
Should the tests include some 10 GB files?
VWWHFSfQ•3h ago
Should include a script for generating 10GB files maybe
sidcool•1h ago
Naah. I meant unit tests. Not load tests.
mprataps•1h ago
I will add unit tests next.
VWWHFSfQ•4h ago
Am I wrong in thinking that this is duplicating lines in memory repeatedly when buffering lines into batches, and then submitting batches to threads? And then again when calling the line processor? Seems like it might be a memory hog
Calzifer•4h ago

        for(int i=0;i<10000; ++i){

            // do nothing just compute hash again and again.
            hash = str.hashCode();
        }
https://github.com/MayankPratap/Samchika/blob/ebf45acad1963d...

"do nothing" is correct, "again and again" not so much. Java caches the hash code for Strings and since the JIT knows that (at least in recent version[1]) it might even remove this loop entirely.

[1] https://news.ycombinator.com/item?id=43854337

hyperpape•2h ago
Even in older versions, if the compiler can see that there are no side-effects, it is free to remove the loop and simply return the value from the first iteration.

I'm actually pretty curious to see what this method does on versions that don't have the optimization to treat hashCodes as quasi-final.

A quick test using Java 17 shows it's not being optimized away _completely_, but it's taking...~1 ns per iteration, which is not enough to compute a hash code.

Edit: I'm being silly. It will just compute the hashcode the first time, and then repeatedly check that it's cached and return it. So the JIT doesn't have to do any _real_ work to make this skip the hash code calculation.

So most likely, the effective code is:

    computeHashCode();
    for (int i = 0; i < 10000; i++) {
        if (false) { // pretend this wouldn't have dead code elimination, and the boolean is actually checked
            computeHashCode();
        }
    }
rzzzt•1h ago
JMH, the microbenchmark harness has an example that highlights this: https://github.com/openjdk/jmh/blob/master/jmh-samples/src/m...
mprataps•1h ago
You are write. This code does not recalculate. However, it was written just as a sample. Mainly user will provide his own method to process the file.
SillyUsername•3h ago
An ArrayList for huge numbers of add operations is not performant. LinkedList will see your list throughput performance at least double. There are other optimisations you can do but in a brief perusal this stood out like a sore thumb.
fedsocpuppet•2h ago
Huh? It'll be slower and eat a massive amount of memory too.
pkulak•2h ago
I've literally never seen a linked list be faster than an array list in a real application, so if you're right, this is kinda huge for me.
ldjkfkdsjnv•2h ago
I could write this library with an llm in a few hours
mprataps•1h ago
May be. I just started this with the intention to learn about multithreading. I learnt a lot of concepts which I had earlier only learnt in theory. I learnt how to use VisualVM to see my thread performance. I learnt to use builder design pattern. No LLM can take away this learning.

And this project is just a start.

bogeholm•1h ago
I could probably do an Ironman if I really wanted to
threeseed•4m ago
No different to cheating off someone at school.

You didn't learn anything. You didn't accomplish anything. And no one including you respects it.

sieve•1h ago
A note on the name.

The nasal "m" takes on the form of the nasal in the row/class of the letter that follows it. As "ñ" is the nasal of the "c" class, the "m" becomes "ñ"

Writing Sanskrit terms using the roman script without using something like IAST/ISO-15919 is a pain in the neck. They are going to be mispronounced one way or the other. I try to get the ISO-15919 form and strip away everything that is not a-z.

So, सञ्चिका (sañcikā) = sancika

You probably want to keep the "ch," as the average English speaker is not going to remember that the "c" is the "ch" of "cheese" and not "see."

arnsholt•1h ago
It’s been ages since I did Sanskrit last, but wouldn’t sam-cika typically have the m realized as an anusvara rather than ñ?
sieve•56m ago
Not unless it precedes a classless letter or it is actually "m."

All nasals becoming anusvaras is something Hindi/Marathi and other languages using the Devanagari script do. Sanskrit uses the specific form of the nasal when available.

mprataps•1h ago
Guys. I love you all. I did not expect such quality feedback.

I will try to incorporate most of your feedback. Your commments have given me much to learn.

This project was started to just learn more about multithreading in a practical way. I think I succeeded with that.