I built a 2x faster lexer, then discovered I/O was the real bottleneck

https://modulovalue.com/blog/syscall-overhead-tar-gz-io-performance/

187•modulovalue•2w ago

Comments

marginalia_nu•1w ago

Zip with no compression is a nice contender for a container format that shouldn't be slept on. It effectively reduces the I/O, while unlike TAR, allowing direct random to the files without "extracting" them or seeking through the entire file, this is possible even via mmap, over HTTP range queries, etc.

You can still get the compression benefits by serving files with Content-Encoding: gzip or whatever. Though it has builtin compression, you can just not use that and use external compression instead, especially over the wire.

It's pretty widely used, though often dressed up as something else. JAR files or APK files or whatever.

I think the articles complaints about lacking unix access rights and metadata is a bit strange. That seems like a feature more than a bug, as I wouldn't expect this to be something that transfers between machines. I don't want to unpack an archive and have to scrutinize it for files with o+rxst permissions, or have their creation date be anything other than when I unpacked them.

stabbles•1w ago

> Zip with no compression is a nice contender for a container format that shouldn't be slept on

SquashFS with zstd compression is used by various container runtimes, and is popular in HPC where filesystems often have high latency. It can be mounted natively or with FUSE, and the decompression overhead is not really felt.

ciupicri•1w ago

Wouldn't you still have a lot of syscalls?

stabbles•1w ago

Yes, but with much lower latency. The squashfs file ensures the files are close together and you benefit from fs cache a lot.

LtdJorge•1w ago

You then use io_uring

__turbobrew__•1w ago

Just make sure you mount the squashfs with —direct-io or else you will be double caching (caching the sqfs pages, and caching the uncompressed files within the sqfs). I have no idea why this isn’t the default. Found this out the hard way.

1718627440•1w ago

Isn't this what is already common in the Python community?

> I don't want to unpack an archive and have to scrutinize it for files with o+rxst permissions, or have their creation date be anything other than when I unpacked them.

I'm the opposite, when I pack and unpack something, I want the files to be identical including attributes. Why should I throw away all the timestamps, just because the file were temporarily in an archive?

rustyhancock•1w ago

Yes, it's a lossy process.

If your archive drops it you can't get it back.

If you don't want it you can just chmod -R u=rw,go=r,a-x

1718627440•1w ago

> If your archive drops it you can't get it back.

Hence, the common archive format is tar not zip.

password4321•1w ago

> Why should I throw away all the timestamps, just because the file were temporarily in an archive?

In case anyone is unaware, you don't have to throw away all the timestamps when using "zip with no compression". The metadata for each zipped file includes one timestamp (originally rounded to even number of seconds in local time).

I am a big last modified timestamp fan and am often discouraged that scp, git, and even many zip utilities are not (at least by default).

rcxdude•1w ago

git updates timestamps in part by necessity of compatibility with build systems. If it applied the timestamp of when the file was last modified on checkout then most build systems would break if you checked out an older commit.

ralferoo•1w ago

git blame is more useful than the file timestamp in any case.

nh2•1w ago

There is some confusion here.

ZIP retains timestamps. This makes sense because timestamps are a global concept. Consider them a attribute dependent on only the file in ZIP, similar to the file's name.

Owners and permissions are dependent also on the computer the files are stored on. User "john" might have a different user ID on another computer, or not exist there at all, or be a different John. So there isn't one obvious way to handle this, while there is with timestamps. Archiving tools will have to pick a particular way of handling it, so you need to pick the tool that implements the specific way you want.

pwg•1w ago

> ZIP retains timestamps.

It does, but unless the 'zip' archive creator being used makes use of the extensions for high resolution timestamps, the basic ZIP format retains only old MSDOS style timestamps (rounded to the closed two seconds). So one may lose some precision in ones timestamps when passing files through a zip archive.

nh2•1w ago

That's correct. I think it is not hard to use the high resolution timestamps, but still they do not have the same precision as a UNIX tv_nsec value, which can be annoying if you want to preserve the _exact_ time that common Linux filesystems can store.

zahlman•1w ago

> Isn't this what is already common in the Python community?

I'm not aware of standards language mandating it, but build tools generally do compress wheels and sdists.

If you're thinking of zipapps, those are not actually common.

1718627440•1w ago

I was talking about using zipfile as a generic file format, instead of open and close.

zahlman•1w ago

I'm afraid I don't understand specifically what you're referring to. Maybe you could show some code citations of popular projects doing it?

Dylan16807•1w ago

> I'm the opposite, when I pack and unpack something, I want the files to be identical including attributes. Why should I throw away all the timestamps, just because the file were temporarily in an archive?

I would expect modified dates to stay the same, and other dates to change similar to copying a directory. I think this is the normal experience with zip?

For creation dates, Linux usually doesn't even track those at all. There's partial support on BTRFS and ZFS, and on ext4 there nowhere to store it at all.

LtdJorge•1w ago

Doesn’t ZIP have all the metadata at the end of the file, requiring some seeking still?

conradludgate•1w ago

Yes, but it's an O(1) random access seek rather than O(n) scanning seek

marginalia_nu•1w ago

It has an index at the end of the file, yeah, but once you've read that bit, you learn where the contents are located and if compression is disabled, you can e.g. memory map them.

With tar you need to scan the entire file start-to-finish before you know where the data is located, as it's literally a tape archiving format, designed for a storage medium with no random access reads.

01HNNWZ0MV43FF•1w ago

I thought Tar had an extension to add an index, but I can't find it in the Wikipedia article. Maybe I dreamt it.

st_goliath•1w ago

You might be thinking of ar, the classic Unix ARchive that is used for static libraries?

The format used by `ar` is a quite simple, somewhat like tar, with files glued together, a short header in between and no index.

Early Unix eventually introduced a program called `ranlib` that generates and appends and index for libraries (also containing extracted symbols) to speed up linking. The index is simply embedded as a file with a special name.

The GNU version of `ar` as well as some later Unix descendants support doing that directly instead.

cb321•1w ago

Besides `ar` as a sibiling observed, you might also be thinking of pixz - https://github.com/vasi/pixz , but really any archive format (cpio, etc.) can, in principle, just put a stake in the ground to have its last file be any kind of binary / whatever index file directory like Zip. Or it could hog a special name like .__META_INF__ instead.

zahlman•1w ago

> It's pretty widely used, though often dressed up as something else. JAR files or APK files or whatever.

JAR files generally do/did use compression, though. I imagine you could forgo it, but I didn't see it being done. (But maybe that was specific to the J2ME world where it was more necessary?)

charcircuit•1w ago

Specifically the benefit is for the native libraries within the file as you can map the library directly to memory instead of having to make a decompressed copy and then mapping that copy to memory.

zahlman•1w ago

Yes, that's clear. I'm just not aware of people actually doing that, or having done it back in the era when Java was more dominant.

charcircuit•1w ago

The bigger issue is that glibc doesn't support loading libraries from zip archives where bionic's linker ddoes. So on platforms where glibc is used you wouldn't see it being done.

zahlman•1w ago

Again, I was talking about Java (not C). Good to know, though.

paulddraper•1w ago

> I wouldn't expect this to be something that transfers between machines

Maybe non-UNIX machines I suppose.

But I 100% need executable files to be executable.

marginalia_nu•1w ago

Do you also want the setuid bit I added?

Joker_vD•1w ago

Honestly, sometimes I just want to mark all files on a Linux system as executable and see what would even break and why. Seriously, why is there a whole bit for something that's essentially an 'read permission, but you can also directly execute it from the shell'?

paulddraper•1w ago

It’s a security thing, in conjunction with sudoers, I think.

pram•1w ago

From the days when UNIX was primarily multiuser/timeshare. You can prevent users from running wacky stuff with the umask.

Joker_vD•1w ago

No you can't. If a user can read something, it can execute it. The only thing where it matters is setuid applications where the setuid bit allows the user to run an application as someone else. But it's already a separate permission bit, and frankly, the whole setuid idea turned out to be quite a high-maintenance design in the end, with lots of additional features heaped on top of it to help mitigate the worst vulnerabilities.

dwattttt•1w ago

This seems like something that shouldn't be the container formats responsibility. You can record arbitrary metadata and put it in a file in the container, so it's trivial to layer on top.

On the other hand, tie the container structure to your OS metadata structure, and your (hopefully good) container format is now stuck with portability issues between other OSes that don't have the same metadata layout, as well as your own OS in the past & future.

paulddraper•1w ago

What is a container then?

Just an id,blob format?

The purpose of tar (or competitors) is to serialize files and their metadata.

dwattttt•1w ago

Tar is not the pinnacle of "containers"; it has age and ubiquity, and that's about it at this point.

Tar's purpose was to serialise files and metadata in 1979, accounting for tape foibles such as fixed or variable data block size.

hinkley•1w ago

Gzip will make most line protocols efficient enough that you can do away with needing to write a cryptic one that will just end up being friction every time someone has to triage a production issue. Zstd will do even better.

The real one-two punch is make your parser faster and then spend the CPU cycles on better compression.

cb321•1w ago

DNA researchers developed a parallel format for gzip they call "bgzip" ( https://learngenomics.dev/docs/genomic-file-formats/compress... ) that makes data seem less trapped behind a decompression perf wall. Zstd is still a bit faster (but < ~2X) and also gets better compression ratios (https://forum.nim-lang.org/t/5103#32269)

smallstepforman•1w ago

This is how Haiku packages are managed, from the outside its a single zstd file, internally all dependacies and files and included in read only file. Reduces IO, reduces file clutter, instant install/uninstall, zero chance for user to corrupt files or dependancy, and easy to switch between versions. The Haiku file system also supports virtual dir mapping so the stubborn Linux port thinks its talking to /usr/local/lib, but in reality its part of the zstd file in /system/packages.

0xbadcafebee•1w ago

Strangely enough, there is a tool out there that gives Zip-like functionality while preserving Tar metadata functionality, that nobody uses. It even has extra archiving functions like binary deltas. dar (Disk ARchive) http://dar.linux.free.fr/

spwa4•1w ago

You mean ZIP?

Zip has 2 tricks: First, compression is per-file, allowing extraction of single files without decompressing anything else.

Second, the "directory" is at the end, not the beginning, and ends in the offset of the beginning of the directory. Meaning 2 disk seeks (matters even on SSDs) and you can show the user all files.

Then, you know exactly what bytes are what file and everything's fast. Second, you can easily take off the directory from the zip file, allowing new files to be added without modifying the rest of the file, which can be extended to allow for arbitrary modification of the contents, although you may need to "defragment" the file.

And I believe, encryption is also per-file. Meaning to decrypt a file you need both the password and the directory entry, which means that if you delete a file, and rewrite just the directory, the data is unrecoverable without requiring a total rewrite of the bytes.

hedora•1w ago

I think Zip's main trick is that it's been preloaded on everything forever.

thaumasiotes•1w ago

> It effectively reduces the I/O, while unlike TAR, allowing direct random to the files without "extracting" them or seeking through the entire file

How do you access a particular file without seeking through the entire file? You can't know where anything is without first seeking through the whole file.

charcircuit•1w ago

You look at the end of the file which tells you where the central directory is. The directory tells you where individual files are.

nikanj•1w ago

At the end of the ZIP file, there's a central directory of all files contained in that archive. Read the last block, seek to the block containing the file you want to access, done

thaumasiotes•1w ago

> At the end of the ZIP file, there's a central directory of all files contained in that archive.

Where does that begin?

> Read the last block

You mean the last 4KB chunk defined by the file system, or what? The comment can be up to 64KB long.

Dylan16807•1w ago

> You mean the last 4KB chunk defined by the file system, or what? The comment can be up to 64KB long.

Okay, the last 65KB.

Are you nitpicking now that you learned about the directory, or did you know about it before your first comment and pretended not to for some reason?

account42•1w ago

One problem with the zip format is that metadata is stored both in the central directory and also before each file data - that creates ambiguity when the metadata differs which different programs/libraries don't handle consistently.

akaltar•1w ago

Amazing article, thanks for sharing. I really appreciate the deep investigations in response to the comments

stabbles•1w ago

"I/O is the bottleneck" is only true in the loose sense that "reading files" is slow.

Strictly speaking, the bottleneck was latency, not bandwidth.

raggi•1w ago

there are a loooot of languages/compilers for which the most wall-time expensive operation in compilation or loading is stat(2) searching for files

ghthor•1w ago

I actually ran into this issue building dependency graphs of a golang monorepo. We analyzed the cpu trace and found that the program was doing a lot of GC so we reduced allocations. This was just noise though as the runtime was just making use of time waiting for I/O as it had shelled out to go list to get a json dep graph from the CLI program. This turns out to be slow due to stat calls and reading from disk. We replaced our usage of go list with a custom package import graph parser using the std lib parser packages and instead of reading from disk we give the parser byte blobs from git, also using git ls-files to “stat” the files. Don’t remember the specifics but I believe we brought the time from 30-45s down to 500ms to build the dep graph.

nudpiedo•1w ago

Same thing applies to other system aspects:

compressing the kernel loads it faster on RAM even if it still has to execute the un compressing operation. Why?

Load from disk to RAM is a larger bottleneck than CPU uncompressing.

Same is applied to algorithms, always find the largest bottleneck in your dependent executions and apply changes there as the rest of the pipeline waits for it. Often picking the right algorithm “solves it” but it may be something else, like waiting for IO or coordinating across actors (mutex if concurrency is done as it used to).

That’s also part of the counterintuitive take that more concurrency brings more overhead and not necessarily faster execution speeds (topic largely discussed a few years ago with async concurrency and immutable structures).

direwolf20•1w ago

Networks too. Compressing the response with gzip is usually faster than sending it uncompressed through the network. This wasn't always the case.

solatic•1w ago

Headline is wrong. I/O wasn't the bottleneck, syscalls were the bottleneck.

Stupid question: why can't we get a syscall to load an entire directory into an array of file descriptors (minus an array of paths to ignore), instead of calling open() on every individual file in that directory? Seems like the simplest solution, no?

cb321•1w ago

One aspect of the question is that "permissions" are mostly regulated at the time of open and user-code should check for failures. This was a driving inspiration for the tiny 27 lines of C virtual machine in https://github.com/c-blake/batch that allows you to, e.g., synthesize a single call that mmaps a whole file https://github.com/c-blake/batch/blob/64a35b4b35efa8c52afb64... which seems like it would have also helped the article author.

stabbles•1w ago

What comes closest is scandir [1], which gives you an iterator of direntries, and can be used to avoid lstat syscalls for each file.

Otherwise you can open a dir and pass its fd to openat together with a relative path to a file, to reduce the kernel overhead of resolving absolute paths for each file.

[1] https://man7.org/linux/man-pages/man3/scandir.3.html

zokier•1w ago

in what way does scandir avoid stat syscalls?

stabbles•1w ago

Because you get an iterator over `struct dirent`, which includes `d_type` for popular filesystems.

Notice that this avoids `lstat` calls; for symlinks you may still need to do a stat call if you want to stat the target.

direwolf20•1w ago

This is a (3) man page which means it's not a syscall. Have you checked it doesn't call lstat on each file?

stabbles•1w ago

Fair, https://www.man7.org/linux/man-pages/man2/getdents64.2.html is a better link. You'd have to call lstat when d_type is DT_UNKNOWN

arter45•1w ago

>why can't we get a syscall to load an entire directory into an array of file descriptors (minus an array of paths to ignore), instead of calling open() on every individual file in that directory?

You mean like a range of file descriptors you could use if you want to save files in that directory?

levodelellis•1w ago

Not sure, I'd like that too

You could use io_uring but IMO that API is annoying and I remember hitting limitations. One thing you could do with io_uring is using openat (the op not the syscall) with the dir fd (which you get from the syscall) so you can asynchronously open and read files, however, you couldn't open directories for some reason. There's a chance I may be remembering wrong

justsomehnguy•1w ago

If you don't need the security at all then yes. Otherwise you need to check every file for the permissions.

direwolf20•1w ago

You can probably do it with io_uring, as a generic syscall batching mechanism.

king_geedorah•1w ago

io_uring supports submitting openat requests, which sounds like what you want. Open the dirfd, extract all the names via readdir and then submit openat SQEs all at once. Admittedly I have not used the io uring api myself so I can't speak to edge cases in doing so, but it's "on the happy path" as it were.

https://man7.org/linux/man-pages/man3/io_uring_prep_open.3.h...

https://man7.org/linux/man-pages/man2/readdir.2.html

Note that the prep open man page is a (3) page. You could of course construct the SQEs yourself.

torginus•1w ago

You have a limit of 1k simultaneous open files per process - not sure what overhead exists in the kernel that made them impose this, but I guess it exists for a reason. You might run into trouble if you open too many files at ones (either the kernel kills your process, or you run into some internal kernel bottleneck that makes the whole endeavor not so worthwhile)

dinosaurdynasty•1w ago

That's mainly for historical reasons (select syscall can only handle fds<1024), modern programs can just set their soft limit to their hard limit and not worry about it anymore: https://0pointer.net/blog/file-descriptor-limits.html

paulddraper•1w ago

io_uring can open multiple files.

ori_b•1w ago

It's not the syscalls. There were only 300,000 syscalls made. Entering and exiting the kernel takes 150 cycles on my (rather beefy) Ryzen machine, or about 50ns per call.

Even assume it takes 1us per mode switch, which would be insane, you'd be looking at 0.3s out of the 17s for syscall overhead.

It's not obvious to me where the overhead is, but random seeks are still expensive, even on SSDs.

ncruces•1w ago

Didn't test, but my guess is it's not “syscalls” but “open,” “stat,” etc; “read” would be fine. And something like “openat” might mitigate it.

mnahkies•1w ago

Something that struck me earlier this week was when profiling certain workloads, I'd really like a flame graph that included wall time waiting on IO, be it a database call, filesystem or other RPC.

For example, our integration test suite on a particular service has become quite slow, but it's not particularly clear where the time is going. I suspect a decent amount of time is being spent talking to postgres, but I'd like a low touch way to profile this

trillic•1w ago

See if you can wrap the underlying library call to pg.query or whatever it is with a generic wrapper that logs time in the query function. Should be easy in a dynamic lang.

Kuinox•1w ago

Tracing profiler can do exactly that, you don't need a dynamic lang.

6keZbCECT2uB•1w ago

There's prior work: https://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.h...

There are a few challenges here. - Off-cpu is missing the interrupt with integrated collection of stack traces, so you instrument a full timeline when they move on and off cpu or periodically walk every thread for its stack trace - Applications have many idle threads and waiting for IO is a common threadpool case, so its more challenging to associate the thread waiting for a pool doing delegated IO from idle worker pool threads

Some solutions: - Ive used nsight systems for non GPU stuff to visualize off CPU time equally with on CPU time - gdb thread apply all bt is slow but does full call stack walking. In python, we have py-spy dump for supported interpreters - Remember that any thing you can represent as call stacks and integers can be converted easily to a flamegraph. eg taking strace durations by tid and maybe fd and aggregating to a flamegraph

jchw•1w ago

Sounds more like the VFS layer/FS is the bottleneck. It would be interesting to try another FS or operating system to see how it compares.

arter45•1w ago

Some say Mac OS X is (or used to be) slower than Linux at least for certain syscalls.

https://github.com/golang/go/issues/28739#issuecomment-10426...

https://stackoverflow.com/questions/64656255/why-is-the-c-fu...

https://github.com/valhalla/valhalla/issues/1192

https://news.ycombinator.com/item?id=13628320

Not sure what's the root cause, though.

jchw•1w ago

This would not be surprising at all! An impressive amount of work has gone into making the Linux VFS and filesystem code fast and scalable. I'm well aware that Linux didn't invent the RCU scheme, but it uses variations on RCU liberally to make filesystem operations minimally contentious, and aggressively caches. (I've also learned recently that the Linux VFS abstractions are quite different from BSD/UNIX, and they don't really map to eachother. Linux has many structures, like dentries and generic inodes, that map to roughly one structure in BSD/UNIX, the vnode structure. I'm not positive that this has huge performance implications but it does seem like Linux is aggressive at caching dentries which may make a difference.)

That said, I'm certainly no expert on filesystems or OS kernels, so I wouldn't know if Linux would perform faster or slower... But it would be very interesting to see a comparison, possibly even with a hypervisor adding overhead.

djmips•1w ago

profile profile profile

rurban•1w ago

I still use a 10x faster lexer, RE2C over flex, because it does so much more at compile-time. And on top of that has a bunch of optimization options for better compilers, like computed goto's.

Of course syscalls suck, slurping the whole file at once always wins, and in this case all files at once.

Kernels suck in general. You don't really need one for high perf and low space.

sheepscreek•1w ago

Always knew this anecdotally - through experience - told myself it’s the file system block size that’s the culprit. Put together with SSD random seek times, it makes sense on the surface. I never thought of syscalls being so expensive. But they might just be symptom and not the bottleneck itself (after all it’s just a function call to the server). My initial thought was DMA. You see, CPUs usually have direct access to only one PCI/e in most consumer hardware. The other PCI/e and mvme.2 slots share the same bandwidth and take turns. When someone wants access, they do a dance with the CPU and other parts of the computer using INT or (or interrupt) instructions that make the CPU pause so I/O can take over for a bit. The switching back and forth is costly too and adds up quickly.

That said, it wouldn’t explain why a MacBook (which should have the SSD already on the fastest/dedicated pathway) be this slow unless something else in the OS was the bottleneck?

I think we’re just scratching the surface here and there is more to this story that is waiting to be discovered. But yeah, to get the job done, package it in fewer files for the OS, preload into RAM or use mmap, then profit.

Dwedit•1w ago

On Windows, an optimization is to call CloseHandle from a secondary thread.

hwspeed•1w ago

Classic case of optimizing the wrong thing. I've hit similar issues with ML training pipelines where GPU utilization looks terrible because data loading is the bottleneck. The profiler tells you the GPU kernel is fast, but doesn't show you it's sitting idle 80% of the time waiting for the next batch. Amdahl's law is brutal when you've got a serial component in your pipeline.

torginus•1w ago

I started programming on DOS - I remember how amazing was that you basically almost talked to hardware directly, there was very little restriction on what you could do, and the OS (which imo was much more akin to a set of libraries) provided very little abstraction for you.

Then I moved to Windows, and Linux. Each had its own idiosyncrasies, like how everything is a file on Linux, and you're supposed to write programs by chaining existing executables together, or on the desktop, both Win32 and X11 started out with their own versions of UI elements, so XWindow or Win32 would know about where a 'button' was, and the OS was responsible for event handling and drawing stuff.

Eventually both Windows and Linux programs moved to a model where the OS just gave you the window as a drawing surface, and you were supposed to fill it.

Similarly, all other OS supplied abstractions slowly fell out of use beyond the bare minimum.

Considering this, I wonder if it's time to design a new, much lower level abstraction, for file systems in this case, this would be a way to mmap an entire directory into the process space, where each file would be a struct, whicha had a list of pointers for the pages on the disk, and each directory would be a list of such entries, again stored in some data structure you could access, synchronizing reads/writes would be orechestrated by the kernel somehow (I'm thinking locking/unlocking pages being written to).

So that way there'd be no difference between traversing an in-memory data structure and reading the disk.

I know this approach isnt super compatible with the async/await style of I/O, however I'm not 100% convinced that's the correct approach either (disk paging is a fundamental feature of all OSes, yet is absolutely inexpressible in programming terms)

LorenPechtel•1w ago

I'd love to see this.

Bring back the "segmented" memory architecture. It was not evil because of segments, but because of segment size. If any segment can be any size the bad aspects fall away.

File handles aren't needed anymore. You open a file, you get back a selector rather than an ID. You reference memory from that selector, the system silently swaps pages in as needed.

You could probably do the same thing with directories but I haven't thought about it.

torginus•1w ago

The idea as I stated it is super half-baked but

> You could probably do the same thing with directories but I haven't thought about it.

For example in the FAT filesystem, a directory is just a file with a special flag set in its file descriptor and inside said file there is just a list of file descriptors. Not sure if something so simple would a good idea, but it certainly works and has worked IRL.

renox•1w ago

> Eventually both Windows and Linux programs moved to a model where the OS just gave you the window as a drawing surface, and you were supposed to fill it.

If you follow this model, how do you solve the accessibility issue?

1vuio0pswjnm7•1w ago

"TAR vs ZIP: Sequential vs Random Access"

https://199.233.217.201/pub/pkgsrc/distfiles/dictd-1.13.3.ta...

Wikipedia:

"In order to efficiently store dictionary data, dictzip, an extension to the gzip compression format (also the name of the utility), can be used to compress a .dict file. Dictzip compresses file in chunks and stores the chunk index in the gzip file header, thus allowing random access to the data."

Sparkyte•1w ago

I/O has been the bottleneck for many things especially databases.

So as someone who has seen a long spread of technological advancements over the years I can confidently tell you that chips have far surpassed any peripheral components.

Kind of that scenario where compute has to be fast enough anyway to support I/O. So really it always has to be faster, but I am saying that it has exceeded those expectations.

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

Wellness Hotels Discovery Application

NASA delays moon rocket launch by a month after fuel leaks during test

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency

The age of a treacherous, falling dollar

Ask HN: AI Generated Diagrams

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Show HN: A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

Wellness Hotels Discovery Application

NASA delays moon rocket launch by a month after fuel leaks during test

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency

The age of a treacherous, falling dollar

Ask HN: AI Generated Diagrams

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Show HN: A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

I built a 2x faster lexer, then discovered I/O was the real bottleneck

Comments