But mmap() was implemented in C because C is the natural language for exposing Unix system calls and mmap() is a syscall provided by the OS. And this is true up and down the stack. Best language for integrating with low level kernel networking (sockopts, routing, etc...)? C. Best language for async I/O primitives? C. Best language for SIMD integration? C. And it goes on and on.
Obviously you can do this stuff (including mmap()) in all sorts of runtimes. But it always appears first in C and gets ported elsewhere. Because no matter how much you think your language is better, if you have to go into the kernel to plumb out hooks for your new feature, you're going to integrated and test it using a C rig before you get the other ports.
[1] Given that the pedantry bottle was opened already, it's worth pointing out that you'd have gotten more points by noting that it appeared in 4.2BSD.
The underlying syscall doesn't use the C ABI, you need to wrap it to use it from C in the same way you need to wrap it to use it from any language, which is exactly what glibc and friends do.
Moral of the story is mmap belongs to the platform, not the language.
https://github.com/AdaCore/florist/blob/master/libsrc/posix-...
C has those too and am glad that they do. This is what allows one to do other things while the buffer gets filled, without the need for multithreading.
Yes easier standardized portable async interfaces would have been nice, not sure how well supported they are.
The other palatable way is to register consumer coroutines on a system provided event-loop. In C one does so with macro magic, or using stack switching with the help of tiny bit of insight inline assembly.
Take a look at Simon Tatham's page on coroutines in C.
To get really fast you may need to bypass the kernel. Or have more control on the event loop / scheduler. Database implementations would be the place to look.
All these methods are in the standard library, i.e. they work on all platforms. The C code is specific to POSIX; Windows supports memory mapped files too but the APIs are quite different.
https://learn.microsoft.com/en-us/dotnet/standard/io/memory-...
So if you wanted to handle file read/write errors you would need to implement signal handlers.
https://stackoverflow.com/questions/6791415/how-do-memory-ma...
My interpretation always was the mmap should only be used for immutable and local files. You may still run into issues with those type of files but it’s very unlikely.
When I was first taught C formally, they definitely walked us through all the standard FILE* manipulators and didn't mention mmap() at all. And when I first heard about mmap() I couldn't imagine personally having a reason to use it.
It's simple, I'll give it that.
> Look inside
> Platform APIs
Ok.
I agree platform APIs are better than most generic language APIs at least. I disagree on mmap being the "best".
Also using mmap is not as simple as the article lays out. For example what happens when another process modifies the file and now your processes' mapped memory consists of parts of 2 different versions of the file at the same time. You also need to build a way to know how to grow the mapping if you run out room. You also want to be able to handle failures to read or write. This means you pretty much will need to reimplement a fread and fwrite going back to the approach the author didn't like: "This works, but is verbose and needlessly limited to sequential access." So it turns out "It ends up being just a nicer way to call read() and write()" is only true if you ignore the edge cases.
No it doesn't. If you have a file that's 2^36 bytes and your address space is only 2^32, it won't work.
On a related digression, I've seen so many cases of programs that could've handled infinitely long input in constant space instead implemented as some form of "read the whole input into memory", which unnecessarily puts a limit on the input length.
I’ve seen otherwise competent developers use compile time flags to bypass memmap on 32-bit systems even though this always worked! I dealt with database engines in the 1990s that used memmap for files tens of gigabytes in size.
I'm not sure what the author really wants to say. mmap is available in many languages (e.g. Python) on Linux (and many other *nix I suppose). C provides you with raw memory access, so using mmap is sort-of-convenient for this use case.
But if you use Python then, yes, you'll need a bytearray, because Python doesn't give you raw access to such memory - and I'm not sure you'd want to mmap a PyObject anyway?
Then, writing and reading this kind of raw memory can be kind of dangerous and non-portable - I'm not really sure that the pickle analogy even makes sense. I very much suppose (I've never tried) that if you mmap-read malicious data in C, a vulnerability would be _quite_ easy to exploit.
https://docs.oracle.com/javase/8/docs/api/java/nio/MappedByt...
https://learn.microsoft.com/en-us/dotnet/api/system.io.memor...
https://learn.microsoft.com/en-us/windows/win32/memory/creat...
Memory mapping is very common.
FrankWilhoit•2h ago