lsr: ls with io_uring

https://tangled.sh/@rockorager.dev/lsr

184•mpweiher•4h ago

Comments

movomito•4h ago

Link doesn’t work

Imustaskforhelp•4h ago

Hm, well I have replied it to some other comment too but the link is working fine for me.

Currently downloading zig to build it.

SillyUsername•4h ago

Love it.

I'm trying to understand why all command line tools don't use io_uring.

As an example, all my nvme's on usb 3.2 gen 2 only reach 740MB/s peak.

If I use tools with aio or io_uring I get 1005MB/s.

I know I may not be copying many files simultaneously every time, but the queue length strategies and the fewer locks also help I guess.

superkuh•4h ago

One reason is so that they work in all linux environments rather than just bleeding edge installs from the last couple years.

tyingq•4h ago

Probably historical preference for portability without a bunch of #ifdef means platform+version-specific stuff is very late to get adopted. Though, at this point, the benefit of portability across various posixy platforms is much lower.

Retr0id•3h ago

Has anyone written an io_uring "polyfill" library with fallback to standard posix-y IO? It could presumably be done via background worker threads - at a perf cost.

vlovich123•3h ago

Seems like a huge lift since io_uring is an ever growing set of interfaces that is encompassing more and more of the kernel surface area. Also, the problem tends to not necessarily be that the io_uring interface isn’t available at compile time but a) the version you distribute to has a kernel with it disabled or you don’t have permission to use it meaning you need to do LD_preload magic or use a framework b) the kernel you’re using supports some of the interfaces you’re trying to use but not all. Not sure how you solve that one without using a framework.

But I agree. It would be cool if it was transparent, but this is actually what a bunch of io-uring runtimes do, using epoll as a fallback (eg in Rust monoio)

never_inline•4h ago

Poe's law hits again.

elcapitan•3h ago

iirc io_uring also had some pretty significant security issues early on (a couple of years ago). Those should be fixed by now, but that probably dampened adoption as well.

jeffbee•1h ago

Not years ago. io_uring has been a continuous parade of security problems, including a high severity one that wasn't fixed until a few months ago. Many large organizations have patched it out of their kernels on safety basis, which is one of the reasons it suffers from poor adoption.

tln•3h ago

Thats a great speed boost. What tools are these?

Thaxll•3h ago

io_uring is a security nightmare.

pjc50•3h ago

How so?

Thaxll•3h ago

This is a good read on the topic: https://chomp.ie/Blog+Posts/Put+an+io_uring+on+it+-+Exploiti...

sim7c00•3h ago

you give process direct access to a piece of kernel memory. its a reason why there is separation. thats all.

wtallis•2h ago

Most of the security concerns with io_uring that I've seen aren't related to the shared buffers at all but simply stem from the fact that io_uring is a mechanism to instruct the kernel to do stuff without making system calls, so security measures that focus on what system calls a process is allowed to do are ineffective.

loeg•2h ago

This isn't the issue; it's relatively easy to safely share some ring buffers. The issue was/is that io_uring is rapidly growing the equivalent of ~all historical Linux syscall interfaces and sometimes comparable security measures were missed on the new interfaces. (Also, stuff like seccomp filters on syscalls are kind of meaningless for io_uring.)

duped•1h ago

...don't you supply the memory in the submission queue? or do you mean the queues themselves?

fpoling•1h ago

io_uring is the asynchronous interface and that requires to use even-based architecture to use it effectively. But many command-line tools are still written is a straightforward sequential style. If C would have async or similar mechanism to pretend doing async programming sequentially, it would be easier to port. But without that a very significant refactoring is necessary.

Besides, io_uring is not yet stable and who knows may be in 10 years it will be replaced by yet another mechanism to take advantage of even newer hardware. So simply waiting for io_uring prove it is here to stay is very viable strategy. Besides in 10 years we may have tools/AI that will do the rewrite automatically...

mananaysiempre•1h ago

> If C would have async or similar mechanism to pretend doing async programming sequentially, it would be easier to port.

The *context() family of formerly-POSIX functions (clownishly deprecated as “use pthreads instead”) is essentially a full implementation of stackful coroutines. Even the arguable design botch of them preserving the signal mask (the reason why they aren’t the go-to option even on Linux) is theoretically fixable on the libc level without system calls, it’s just a lot of work and very few can be bothered to do signals well.

As far as stackless coroutines, there’s a wide variety of libraries used in embedded systems and such (see the recent discussion[1] for some links), which are by necessity awkward enough that I don’t see any of them becoming broadly accepted. There were also a number of language extensions, among which I’d single out AC[2] (from the Barrelfish project) and CPC[3]. I’d love for, say, CPC to catch on, but it’s been over a decade now.

[1] https://news.ycombinator.com/item?id=44546640

[2] https://users.soe.ucsc.edu/~abadi/Papers/acasync.pdf

[3] https://www.irif.fr/~jch/research/cpc-2012.pdf

cesarb•10m ago

> I'm trying to understand why all command line tools don't use io_uring.

Because it's fairly new. The coreutils package which contains the ls command (and the three earlier packages which were merged to create it) is decades old; io_uring appeared much later. It will take time for the "shared ring buffer" style of system call to win over traditional synchronous system calls.

fermuch•4h ago

The link isn't working for me. For those who were able to see it: does it improve anything by using that instead of what ls does now??

Imustaskforhelp•4h ago

Hm interesting, it worked for me.

ta988•4h ago

70% faster, but more importantly 35x times less syscalls.

eviks•4h ago

Is there a noticeable benefit of this huge syscall reduction?

Imustaskforhelp•4h ago

Yes I just checked it after installing strace

strace -c ls gave me this

100.00 0.002709 13 198 5 total

strace -c eza gave me this

100.00 0.006125 12 476 48 total

strace -c lsr gave me this

100.00 0.001277 33 38 total

So seeing the number of syscalls in the calls directory

198 : ls

476 : eza

33 : lsr

A meaningful difference indeed!

richardwhiuk•2h ago

That's just observing there is a difference, not explaining why that's a good thing.

fpoling•1h ago

syscalls are expensive and their relative latency compared with the rest of code only grow especially in view of mitigations against cache-related and other other hardware bugs.

loeg•3h ago

Why do you say more importantly? The time is all that matters, I think.

plq•2h ago

%70 faster = you wait less

35x less system calls = others wait less for the kernel to handle their system calls

loeg•2h ago

> 35x less system calls = others wait less for the kernel to handle their system calls

That isn't how it works. There isn't a fixed syscall budget distributed among running programs. Internally, the kernel is taking many of the same locks and resources to satisfy io_uring requests as ordinary syscall requests.

plq•2h ago

More system calls mean more overall OS overhead eg. more context switches, or as you say more contention on internal locks etc.

Also, more fs-related system calls mean less available kernel threads to process these system calls. eg. XFS can paralellize mutations only up to its number of allocation groups (agcount)

loeg•1h ago

> More system calls mean more overall OS overhead [than the equivalent operations performed with io_uring]

Again, this just isn't true. The same "stat" operations are being performed one way or another.

> Also, more fs-related system calls mean less available kernel threads to process these system calls.

Generally speaking sync system calls are processed in the context of the calling (user) thread. They don't consume kernel threads generally. In fact the opposite is true here -- io_uring requests are serviced by an internal kernel thread pool, so to the extent this matters, io_uring requests consume more kernel threads.

plq•1h ago

> Again, this just isn't true.

Again, it just is true.

More fs-related operations mean less kthreads available for others. More syscalls means more OS overhead. It's that simple.

rybosome•4h ago

It improves the latency of ls calls.

ninkendo•4h ago

I wonder how it performs against an NFS server with lots of files, especially one over a kinda-crappy connection. Putting an unreliable network service behind blocking POSIX syscalls is one of the main reasons NFS is a terrible design choice (as can be seen by anyone who's tried to ctrl+c any app that's reading from a broken NFS folder), but I wonder if io_uring mitigates the bad parts somewhat.

loeg•3h ago

> as can be seen by anyone who's tried to ctrl+c any app that's reading from a broken NFS folder

Theoretically "intr" mounts allowed signals to interrupt operations waiting on a hung remote server, but Linux removed the option long ago[1] (FreeBSD still supports it)[2]. "soft" might be the only workaround on Linux.

[1]: https://man7.org/linux/man-pages/man5/nfs.5.html

[2]: https://man.freebsd.org/cgi/man.cgi?query=mount_nfs&sektion=...

ape4•2h ago

Samba too

mprovost•2h ago

The designers of NFS chose to make a distributed system emulate a highly consistent and available system (a hard drive), which was (and is) a reasonable tradeoff. It didn't require every existing tool, such as ls, to deal with things like the server rebooting while listing a directory. (The original NFS protocol is stateless, so clients can survive server reboots.) What does vi do when the server hosting the file you're editing stop responding? None of these tools have that kind of error handling.

I don't know how io_uring solves this - does it return an error if the underlying NFS call times out? How long do you wait for a response before giving up and returning an error?

ninkendo•57m ago

> The designers of NFS chose to make a distributed system emulate a highly consistent and available system (a hard drive), which was (and is) a reasonable tradeoff

I don't agree that it was a reasonable tradeoff. Making an unreliable system emulate a reliable one is the very thing I find to be a bad idea. I don't think this is unique to NFS, it applies to any network filesystem you try to present as if it's a local one.

> What does vi do when the server hosting the file you're editing stop responding? None of these tools have that kind of error handling.

That's exactly why I don't think it's a good idea to just pretend a network connection is actually a local disk. Because tools aren't set up to handle issues with it being down.

Contrast it with approaches where the client is aware of the network connection (like HTTP/GRPC/etc)... the client can decide for itself how long it should retry failed requests, whether it should bubble up failures to the caller, or work "offline" until it gets an opportunity to resync, etc. With NFS the syscall just hangs forever by default.

Distributed systems are hard, and NFS (and other similar network filesystems) just pretend it isn't hard at all, which is great until something goes wrong, and then the abstraction leaks.

(Also I didn't say io_uring solves this, but I'm curious as to whether its performance would be any better than blocking calls.)

Imustaskforhelp•4h ago

Really interesting, the difference is real though I would just hope that some better coloring support could be added because I have "eza --icons=always -1" command set as my ls and it looks really good, whereas when I use lsr -1, yes the fundamental thing is same, the difference is in the coloring.

Yes lsr also colors the output but it doesn't know as many things as eza does

For example .opus will show up as a music icon and with the right color (green-ish in my case?) in eza whereas it would be shown up as any normal file in lsr.

Really no regrets though, its quite easy to patch I think but yes this is rock solid and really fast I must admit.

Can you please create more such things but for cat and other system utilities too please?

Also love that its using tangled.sh which is using atproto, kinda interesting too.

I also like that its written in zig which imo feels way more easier for me to touch as a novice than rust (sry rustaceans)

johnisgood•3h ago

As for coloring support, I think the best way would be to implement LS_COLORS / dircolors. My GNU ls looks nice.

iknowstuff•3h ago

til u segfault

Bender•4h ago

I am curious what would happen if ls and other commands were replaced using io_uring and kernel.io_uring_disabled was set to 1. Would it fall back to an older behavior or would the ability to disable it be removed?

rockorager•1h ago

You would have to write your IO to have a fallback. The Ghostty project uses `io_uring`, but on kernels where it isn't available it falls back to an `epoll` model. That's all handled at the library level by libxev.

yencabulator•1h ago

I just realized that one could probably write a userspace io_uring emulator in a library that spawns a thread to read the ringbuffer and a worker pool of threads to do the blocking operations. You'd need to get the main software to make calls to your library instead of the io_uring syscalls, that's it; the app logic could remain the same.

Then all the software wanting to use io_uring wouldn't need to write their low-level things twice.

neuroelectron•4h ago

There used to be lsring by Jens Axboe (author of io_uring), but it no longer exists. This is more extreme than abandoning the project. Perhaps there is some issue with using io_uring this way, perhaps vulnerabilities are exposed.

arghwhat•3h ago

> Perhaps there is some issue with using io_uring this way, perhaps vulnerabilities are exposed.

... no. It's just not interesting or particularly valuable to optimize ls, and Jens probably just used it as a demo and didn't want to keep it around.

neuroelectron•3h ago

I'm sure there are uses in Bash scripts that could benefit from it but most people would use it directly in a compiled program, I suppose, if the performance was a reoccurring need.

neuroelectron•4m ago

Explicit Vulnerabilities (Documented CVEs and Exploits)

These are actual discovered vulnerabilities, typically assigned CVEs and often exploited in sandbox escapes or privilege escalations: 1. CVE-2021-3491 (Kernel 5.11+)

    Type: Privilege escalation

    Mechanism: Failure to check CAP_SYS_ADMIN before registering io_uring restrictions allowed unprivileged users to bypass sandboxing.

    Impact: Bypass of security policy mechanisms.

2. CVE-2022-29582

    Type: UAF (Use-After-Free)

    Mechanism: io_uring allowed certain memory structures to be freed and reused improperly.

    Impact: Local privilege escalation.

3. CVE-2023-2598

    Type: Race condition

    Mechanism: A race in the io_uring timeout code could lead to memory corruption.

    Impact: Arbitrary code execution or kernel crash.

4. CVE-2022-2602, CVE-2022-1116, etc.

    Type: UAF and out-of-bounds access

    Impact: Escalation from containers or sandboxed processes.

5. Exploit Tooling:

    Tools like io_uring_shock and custom kernel exploits often target io_uring in container escape scenarios (esp. with Docker or LXC).

 Implicit Vulnerabilities (Architectural and Latent Risks)

These are not necessarily exploitable today, but reflect deeper systemic design risks or assumptions. 1. Shared Memory Abuse

    io_uring uses shared rings (memory-mapped via mmap) between kernel and user space.

    Risk: If ring buffer memory management has reference count bugs, attackers could force races, data corruption, or misuse stale pointers.

 2. User-Controlled Kernel Pointers

    Some features allow user-specified buffers, SQEs, and CQEs to reference arbitrary memory (e.g. via IORING_OP_PROVIDE_BUFFERS, IORING_OP_MSG_RING).

    Risk: Incomplete validation could allow crafting fake kernel structures or triggering speculative attacks.

 3. Speculative Execution & Side Channels

    Since io_uring relies on pre-submitted work queues and long-lived kernel threads, it opens timing side channels.

    Risk: Predictable scheduling or timing leaks, esp. combined with hardware speculation (Spectre-class).

 4. Bypassing seccomp or AppArmor Filters

    io_uring operations can effectively batch or obscure syscall behavior.

    Example: A program restricted from calling sendmsg() directly might still use io_uring to perform similar actions.

    Risk: Policy enforcement tools become less effective, requiring explicit io_uring filtering.

 5. Poor Auditability

    The batched and asynchronous nature makes logging or syscall audit trails incomplete or confusing.

    Risk: Harder for defenders or monitoring tools to track intent or detect misuse in real time.

 6. Ring Reuse + Threaded Offload

    With IORING_SETUP_SQPOLL or IORING_SETUP_IOPOLL, I/O workers can run in kernel threads detached from user context.

    Risk: Desynchronized security context can lead to privileged operations escaping sandbox context (e.g., post-chroot but pre-fork).

 7. File Descriptor Reuse and Lifecycle Mismatch

    Some operations in io_uring rely on fixed file descriptors or registered files. Race conditions with FD reuse or closing can cause inconsistencies.

    Risk: UAF, type confusion, or logic bombs triggered by kernel state confusion.

 Emerging Threat Vectors
 eBPF + io_uring

    Some exploits chain io_uring with eBPF to do arbitrary memory reads or writes. e.g., io_uring to perform controlled allocations, then eBPF to read or write memory.

 io_uring + userfaultfd

    Combining userfaultfd with io_uring allows very fine-grained control over page faults during I/O — great for fuzzing, also for exploit primitives.

rkangel•3h ago

This was more interesting for the tangled.sh platform it's hosted on. Wasn't aware of that!

nikodunk•1h ago

Same! Just signed up and will be following tangled and this repo. I like how tangled is built on atproto (bluesky).

seanw444•1h ago

Is there any actual focus on ATProto as a decentralized protocol? So far it seems like its only purpose is building Bluesky as a centralized service, which I have no interest in at all.

Retr0id•54m ago

Doesn't the existence of tangled answer your question?

quibono•3h ago

Lovely, I might try doing this for some other "classic" utility!

A bit off-topic too, but I'm new to Zig and curious. This here: ``` const allocator = sfb.get();

    var cmd: Command = .{ .arena = allocator };

``` means that all allocations need to be written with an allocator in mind? I.e. one has to pick an allocator per each memory allocation? Or is there a default one?

kristoff_it•3h ago

Allocator is an interface so you write library code only once, and then the caller decides which concrete implementation to use.

There's cases where you do want to change your code based on the expectation that you will be provided a special kind of allocator (e.g. arenas), but that's a more niche thing and in any case it all comes together pretty well in practice.

IggleSniggle•3h ago

Caveat emptor, I don't write Zig but followed its development closely for awhile. A core design element of zig is that you shouldn't be stuck with one particular memory model. Zig encourages passing an allocator context around, where those allocators conform to a standardized interface. That means you could pass in different allocators with different performance characteristics at runtime.

But yes, there is a default allocator, std.heap.page_allocator

hansvm•2h ago

std.heap.smp_allocator

You should basically only use the page allocator if you're writing another allocator.

SkiFire13•1h ago

> you shouldn't be stuck with one particular memory model

Nit: an allocator is not a "memory model", and I very much want the memory model to not change under my feet.

throwawaymaths•1h ago

> Zig encourages passing an allocator context around, where those allocators conform to a standardized interface.

in libraries. if youre just writing a final product it's totally fine to pick one and use it everywhere.

> std.heap.page_allocator

strongly disrecommend using this allocator as "default", it will take a trip to kernelland on each allocation.

danbruc•3h ago

Why does this require inventing lsr as an alternative to ls instead of making ls use io_uring? It seems pretty annoying to have to install replacements for the most basic command line tools. And especially in this case, where you do not even do it for additional features, just for getting the exact same thing done a bit faster.

nailer•3h ago

`ls` is in C, `lsr` is in Zig. The `lsr` programmer probably doesn't want to make new code in C.

loeg•3h ago

In addition, the author might not want to sign away their rights to the FSF.

andrepd•3h ago

What on earth are you talking about? Why would this be the case?

scott_w•3h ago

Depending on the implementation (and I don't know which `ls` is being referred to), modifying `ls` might mean modifying an FSF project which require copyright assignment as a condition of patch submissions.

loeg•2h ago

Are you unfamiliar with contributing to GNU projects (ls is part of GNU corutils)?

https://www.gnu.org/prep/maintain/maintain.html#Copyright-Pa...

mschuster91•3h ago

> Why does this require inventing lsr as an alternative to ls instead of making ls use io_uring?

Good luck getting that upstreamed and accepted. The more foundational the tools (and GNU coreutils definitely is foundational), the more difficult that process will be.

Releasing a standalone utility makes iteration much faster, partially because one is not bound to the release cycles of distributions.

s1mplicissimus•3h ago

> Releasing a standalone utility makes iteration much faster, partially because one is not bound to the release cycles of distributions.

which certainly is a valid way or prioritizing. similarly, distros/users may prioritize stability, which means the theoretical improvement would now be stuck in not-used-land. the value of software appears when it's run, not when it's written

KPGv2•3h ago

> the value of software appears when it's run, not when it's written

Have you ever tried to contribute to open source projects?

The question was why wouldn't someone writing software not take the route likely to end in rejection/failure. I don't know about you, but if I write software, I am not going to write it for a project whose managers will make it difficult for my PR to be accepted, and that 99% likely it never will be.

I will always contribute to the project likely to appreciate my work and incorporate it.

I'll share an anecdote: I got involved with a project, filed a couple PRs that were accepted (slowly), and then I talked about refactoring something so it could be tested better and wasn't so fragile and tightly coupled to IO. "Sounds great" was the response.

So I did the refactor. Filed a PR and asked for code review. The response was (after a long time waiting) "thanks but no, we don't want this." PR closed. No feedback, nothing.

I don't even use the software anymore. I certainly haven't tried to fix any bugs. I don't like being jerked around by management, especially when I'm doing it for free.

(For the record, I privately forked the code and run my own version that is better because by refactoring and then writing tests, I discovered a number of bugs I couldn't be arsed to file with the original project.)

s1mplicissimus•2h ago

> Have you ever tried to contribute to open source projects?

yes, and it was often painful enough to make me consider very well wether I want to bother contributing. I can only imagine how terrible the experience must be at a core utility such as ls.

> The question was why wouldn't someone writing software not take the route likely to end in rejection/failure

Obviously they wouldn't - in my comment I assumed that the lsr author aimed for providing a better ls for people and tried to offer a perspective with a different definition of what success is.

> I don't like being jerked around by management, especially when I'm doing it for free

I get that. The older OSS projects become, the more they fossilize too - and that makes it more annoying to contribute. But you can try to see it from the maintainers perspective too: They have actual people relying on the program being stable and are often also not paid. Noone is forcing you to contribute to their project, but if you don't want to deal with existing maintainers, you won't have their users enjoying your patchset. Know what you want to achieve and act accordingly, is all I'm trying to say.

mschuster91•1h ago

> The older OSS projects become, the more they fossilize too - and that makes it more annoying to contribute.

Newer ones can be just as braindead, if they came out of some commercial entity. CLAs and such.

WorldMaker•3h ago

In the history of Unix its also a common way to propose tool replacements, for instance how `less` became `more` on most systems, or `vim` became the new `vi` which in its day became the new `ed`.

nailer•3h ago

> instance how `less` became `more` on most systems

How `more` became `less`.

The name of 'more' was from paging - rather than having text scroll off the screen, it would show you one page, then ask if you wanted to see 'more' and scroll down.

'less' is a joke by the less authors. 'less is more' etc.

yencabulator•1h ago

For a while there was a less competitor named most.

JdeBP•49m ago

It hasn't gone away.

* https://freshports.org/sysutils/most/

* https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/misc/most/i...

* https://packages.debian.org/sid/most

One can even get pg still, with Ilumos-based systems; even though that was actually taken out of the SUS years ago. This goes to show that what's standard is not the same as what exists, of course.

* https://illumos.org/man/1/pg

* https://pubs.opengroup.org/onlinepubs/9699919799.2008edition...

JdeBP•2h ago

Yes and no. We don't really have the equivalent of comp.sources.unix nowadays, which is where the early versions of those occurred, and comp.sources.unix did not take just anything. Rich Salz had rules.

Plus, since I actually took stevie and screen and others from comp.sources.unix and worked on them, and wasn't able to even send my improvements to M. Salz or the original authors at all, from my country, I can attest that contributing improvements had hurdles just as large to overcome back then as there exist now. They're just different.

tiagod•3h ago

You don't have to install it. You can modify ls yourself too.

bicolao•2h ago

The author answered on lobster thread [1]. This is more of an io_uring exercise than an attempt to replace ls.

[1] https://lobste.rs/s/mklbl9/lsr_ls_with_io_uring

adgjlsfhk1•3h ago

It's a shame to see uutils doing so poorly here. I feel like they're our best hope for an organization to drive this sort of core modernization forward, but 2x slower than GNU isn't a good start.

ReDress•3h ago

I've been playing around with io_uring for a while.

Still, I am yet to come across a some tests that simulate typical real life application workload.

I heard of fio but are yet to check how exactly it works and whether it might be possible to simulate real life application workload with it.

izabera•20m ago

what a "real life application workload" looks like is entirely dependent on your use case, but fio is very widely used in the storage industry

it's a good first approximation to test the cartesian product of

- sequential/random

- reads/writes

- in arbitrary sizes

- with arbitrarily many workers

- with many different backends to perform such i/o including io_uring

and its reporting is solid and thorough

implementing the same for your specific workload is often not trivial at all

the8472•3h ago

io_uring doesn't support getdents though. so the primary benefit is bulk statting (ls -l). It'd be nice if we could have a getdents in flight while processing the results of the previous one.

loeg•2h ago

POSIX adopting NFS' "readdirplus" operation (getdents + stat) could negate some of the benefit towards io_uring, too.

tln•3h ago

The times seem sublinear, 10k files is less than 10x 1k files.

I remember getting in to a situation during the ext2 and spinning rust days where production directories had 500k files. ls processes were slow enough to overload everything. ls -F saved me there.

And filesystems got a lot better at lots of files. What filesystem was used here?

It's interesting how well busybox fares, it's written for size not speed iirc?

SkiFire13•2h ago

> The times seem sublinear, 10k files is less than 10x 1k files

Two points are not enough to say it's sublinear. It might very well be some constant factor that becomes less and less important the bigger the linear factor becomes.

Or in other words 10000n+C < 10000(n+C)

tln•8m ago

The article has data points for n=10,100,1000,10000. Taking (n=10,000 - n=10)/(n=1,000 - n=10) would eliminate the constant factor and we'd expect about 10.09x higher times for a linear algorithm.

But for lsr, it's 9.34. The other tools have factors close to 10.09 or higher. Since ls has to sort it's output (unless -F is specified) I'd not be too surprised with a little superlinearity.

https://docs.google.com/spreadsheets/d/1EAYua3B3UeTGBtAejPw2...

otterley•2h ago

Ext2 never got better with large directories even with SSDs (this includes up to ext4). The benchmarks don’t include the filesystem type, which is actually extremely important when it comes to the performance of reading directories.

maplant•3h ago

This seems more interesting as demonstration of the amortized performance increase you'd expect from using io_uring, or as a tutorial for using it. I don't understand why I'd switch from using something like eza. If I'm listing 10,000 files the difference is between 40ms and 20ms. I absolutely would not notice that for a single invocation of the command.

0x000xca0xfe•2h ago

Well I have a directory with a couple million JSON files and ls/du take minutes.

Most of the coreutils are not fast enough to actually utilize modern SSDs.

otterley•2h ago

What’s the filesystem type? Ext4 suffers terrible lookup performance with large directories, while xfs absolutely flies.

0x000xca0xfe•1h ago

Yup, default ext4 and most files are <4KB, so it's extra bad.

Thanks for the comment, didn't know that!

rockorager•1h ago

Yeah, I wrote this as a fun little experiment to learn more io_uring usage. The practical savings of using this are tiny, maybe 5 seconds over your entire life. That wasn't the point haha

JuettnerDistrib•1h ago

I'd be curious to know if this helps on supercomputers, which are notorious for frequently hanging for a few seconds on an ls -l.

maplant•1h ago

It's a very cool experiment. Just wanted to perhaps steer the conversation towards those things rather than whether or not this was a good ls replacement because like you say that feels like it was missing the point

api•2h ago

Why isn’t it possible — or is it — to make libc just use uring instead of syscall?

Yes I know uring is an async interface, but it’s trivial to implement sync behavior on top of a single chain of async send-wait pairs, like doing a simple single threaded “conversational” implementation of a network protocol.

It wouldn’t make a difference in most individual cases but overall I wonder how big a global speed boost you’d get by removing a ton of syscalls?

Or am I failing to understand something about the performance nuances here?

ninkendo•2h ago

In order to make this work, libc would have to:

- Start some sort of async executor thread to service the io_uring requests/responses

- Make it so every call to "normal" syscalls causes the calling thread to sleep until the result is available (that's 1 syscall)

- When the executor thread gets a result, have it wake up the original thread (that's another syscall)

So you're basically turning 1 syscall into 2 in order to emulate the legacy syscalls.

io_uring only makes sense if you're already async. Emulating sync on top of async is nearly always a terrible idea.

wtallis•1h ago

You don't need to start spawning new threads to use io_uring as a backend for synchronous IO APIs. You just need to set up the rings once, then when the program does an fwrite or whatever, that gets implemented as sending a submission queue entry followed by a single io_uring_enter syscall that informs the kernel there's something in the submission queue, and using the arguments indicating that the calling process wants to block until there's something in the completion queue.

loeg•2h ago

In addition to sibling's concern about syscall amplification, the async just isn't useful to the application (from a latency perspective) if you just serialize a bunch of sync requests through it.

yencabulator•1h ago

Not speaking of ls which is more about metadata operations, but general file read/write workloads:

io_uring requires API changes because you don't call it like the old read(please_fill_this_buffer). You maintain a pool of buffer that belong to the ringbuffer, and reads take buffers from the pool. You consume the data from the buffer and return it to the pool.

With the older style, you're required to maintain O(pending_reads) buffers. With the io_uring style, you have a pool of O(num_reads_completing_at_once) (I assume with backpressure but haven't actually checked).

rockorager•1h ago

Author of the project here! I have a little write up on this here: https://rockorager.dev/log/lsr-ls-but-with-io-uring

jeffbee•1h ago

How much of the speedup over GNU ls is due to lacking localization features? Your results table is pretty much consistent with my local observations: in a dir with 13k files, `ls -al` needs 33ms. But 25% of that time is spent by libc in `strcoll`. Under `LC_ALL=C` it takes just 27ms, which is getting closer to the time of your program.

rockorager•1h ago

I didn't include `busybox` in my initial table, so it isn't on the blog post but the repo has the data...but I am 99% sure busybox does not have locale support, so I think GNU ls without locale support would probably be closer to busybox.

Locales also bring in a lot more complicated sorting - so that could be a factor also.

tavianator•1h ago

My bfs project also uses io_uring: https://github.com/tavianator/bfs/blob/main/src/ioq.c

I'm curious how lsr compares to bfs -ls for example. bfs only uses io_uring when multiple threads are enabled, but maybe it's worth using it even for bfs -j1

rockorager•56m ago

Oh that's cool. `find` is another tool I thought could benefit from io_uring like `ls`. I think it's definitely worth enabling io_uring for single threaded applications for the batching benefit. The kernel will still spin up a thread pool to get the work done concurrently, but you don't have to manage that in your codebase.

tavianator•38m ago

I did try it a while ago and it wasn't profitable, but that was before I added stat() support. Batching those is probably good

mshockwave•26m ago

and grep / ripgrep. Or did ripgrep migrate to using io_uring already?

burntsushi•16m ago

No, ripgrep doesn't use io_uring. Idk if it ever will.

jasonjmcghee•1h ago

I find it funny that there are icons for .mjs and .cjs file extensions but not .c, .h, .sh

swiftcoder•45m ago

Kind of fascinating that slashing syscalls by ~35x (versus the `ls -la` benchmark) is "only" worth a 2x speedup

Global earthquake detection and warning using Android phones

Usage Rules: Leveling the Playing Field for AI-Assisted Development

How to run an LLM on your laptop

Experts lay into Tesla safety in federal autopilot trial

Ask HN: Parents, what's the best AI tutor for kids?

Structural and semantic deficiencies in the systemd architecture

The Subway Eats You Now

Summarize YouTube videos from the command line

How can I get first users for an AI tool for local businesses?

House Passes Genius Act

Wait a minute – developers who use GenAI tools are slower?

Fictitious Persons Disclaimer

Show HN: Souko.ai – Web scraping, search and extraction APIs for AI workflows

A Survey of Context Engineering for Large Language Models

The Year of Peak Might and Magic

To Bitcoin or Not to Bitcoin? A Corporate Cash Question

Fireworks and Particulate Metal Concentrations on Independence Day

Asyncio Demystified: A Conceputal Overview

DOJ Reveals Sale Price for Seized Wu-Tang Clan Album

Now Figma can set custom shortcuts, with my new plugin "Shortcuts"

Docs for AI Agents

Starbase injury rates outpace rivals as SpaceX chases its Mars moonshot

Alan Kay's tribute to Ted Nelson (2015) [video]

Front End Assets in Ruby on Rails Through the Years

Mac outlook release breaks shortcut key for reply all

Colour Contrast Checker

You Can Change the Snooze Duration in iOS 26

AWS Context Switch with Tab Autocompletion

Unlocking the Power of Transfer Learning in Computer Vision: An Overview

He Rewrote Everything in Rust – Then We Got Fired

Global earthquake detection and warning using Android phones

Usage Rules: Leveling the Playing Field for AI-Assisted Development

How to run an LLM on your laptop

Experts lay into Tesla safety in federal autopilot trial

Ask HN: Parents, what's the best AI tutor for kids?

Structural and semantic deficiencies in the systemd architecture

The Subway Eats You Now

Summarize YouTube videos from the command line

How can I get first users for an AI tool for local businesses?

House Passes Genius Act

Wait a minute – developers who use GenAI tools are slower?

Fictitious Persons Disclaimer

Show HN: Souko.ai – Web scraping, search and extraction APIs for AI workflows

A Survey of Context Engineering for Large Language Models

The Year of Peak Might and Magic

To Bitcoin or Not to Bitcoin? A Corporate Cash Question

Fireworks and Particulate Metal Concentrations on Independence Day

Asyncio Demystified: A Conceputal Overview

DOJ Reveals Sale Price for Seized Wu-Tang Clan Album

Now Figma can set custom shortcuts, with my new plugin "Shortcuts"

Docs for AI Agents

Starbase injury rates outpace rivals as SpaceX chases its Mars moonshot

Alan Kay's tribute to Ted Nelson (2015) [video]

Front End Assets in Ruby on Rails Through the Years

Mac outlook release breaks shortcut key for reply all

Colour Contrast Checker

You Can Change the Snooze Duration in iOS 26

AWS Context Switch with Tab Autocompletion

Unlocking the Power of Transfer Learning in Computer Vision: An Overview

He Rewrote Everything in Rust – Then We Got Fired

lsr: ls with io_uring

Comments