ld --relocatable --whole-archive crappy-regular-static-archive.a -o merged.o
objcopy --localize-hidden merged.o merged.o
This should (?) then solve most issues in the article, except that including the same library twice still results in an error.For instance if two libraries have a source file foo.c with the same name, you can end up with two foo.o, and when you extract they override each other. So you might think to rename them, but actually this nonsense can happen with two foo.o objects in the same archive.
The errors you get when running into these are not fun to debug.
It took a few minutes, probably has a few edge cases I haven't banged out yet, and now I get to `-l` and I can deploy with `rsync` instead of fucking Docker or something.
I take that deal.
but for your trouble: https://gist.github.com/b7r6/0cc4248e24288551bcc06281c831148...
If there's interest in this I can make a priority out of trying to get it open-sourced.
I feel like we really need better toolchains in the first place. None of this intrinsically needs to be made complex, it's all a lack of proper support in the standard tools.
The problem is misaligned incentives: CMake is bad, but it was sort of in the right place at the right time and became a semi-standard, and it's not in the interests of people who work in the CMake ecosystem to emit correct standard artifact manifests.
Dynamic linking by default is bad, but the gravy train on that runs from Docker to AWS to insecure-by-design TLS libraries.
The fix is for a few people who care more about good computing than money or fame to do simple shit like I'm doing above and make it available. CMake will be very useful in destroying CMake: it already encodes the same information that correct `pkg-config` needs.
This is a contrived example akin to "what if I only know the name of the function at runtime and have to dlsym()"?
Have a macro that "enables use of" the logger that the API user must place in global scope, so it can write "extern ctor_name;". Or have library specific additions for LDFLAGS to add --undefined=ctor_name
There are workarounds for this niche case, and it doesn't add up to ".a files were a bad idea", that's just clickbait. You'll appreciate static linkage more on the day after your program survives a dynamic linker exploit
> Every non-static function in the SDK is suddenly a possible cause of naming conflict
Has this person never written a C library before? Step 1: make all globals/functions static unless they're for export. Step 2: give all exported symbols and public header definitions a prefix, like "mylibname_", because linkage has a global namespace. C++ namespaces are just a formalisation of this
Well, you just do what the standard Linux loader does: iterate through the .so's in your library path, loading them one by one and doing dlsym() until it succeeds :)
Okay, the dynamic loader actually only tries the .so's whose names are explicitly mentioned as DT_NEEDED in the .dynamic section but it still is an interesting design choice that the functions being imported are not actually bound to the libraries; you just have a list of shared objects, and a list of functions that those shared objects, in totality, should provide you with.
And prefix everything in your library with a unique string.
This all works as long as libraries are “flat”, but doesn’t scale very well once libraries are built on top of each other and want to hide implementation details.
Also, some managers object to a prefix within non-api functions, and frankly I can understand them.
But its full of strawmen and falsehoods, the most notable being the claims about the deficienies of pkg-config. pkg-config works great, it is just very rarely produced correctly by CMake.
I have tooling and a growing set of libraries that I'll probably open source at some point for producing correct pkg-config from packages that only do lazy CMake. It's glorious. Want abseil? -labsl.
Static libraries have lots of game-changing advantages, but performance, security, and portability are the biggest ones.
People with the will and/or resources (FAANGs, HFT) would laugh in your face if you proposed DLL hell as standard operating procedure. That shit is for the plebs.
It's like symbol stripping: do you think maintainers trip an assert and see a wall of inscrutable hex? They do not.
Vendors like things good for vendors. They market these things as being good for users.
The implementation of Docker is proof of how much money you're expected to pay Bezos to run anything in 2025.
There are use cases for dynamic linking. It's just user-hostile as a mandatory default for a bunch of boring and banal reasons: KitWare doesn't want `pkg-config` to work because who would use CMake if they had straightforward alternatives. The Docker Industrial complex has no reason to exist in a world where Linus has been holding the line of ABI compatibility for 30 years.
Dynamic linking is fine as an option, I think it's very reasonable to ship a `.so` alongside `.a` and other artifacts.
Forcing it on everyone by keeping `pkg-config` and `musl` broken is a more costly own goal for computing that Tony Hoare's famous billion dollar mistake.
No idea how you come to that conclusion, as they are definitively no more secure than shared libraries. Rather the opposite is true, given that you (as end user) are usually able to replace a shared library with a newer version, in order to fix security issues. Better portability is also questionable, but I guess it depends on your definition of portable.
Portability is to any fucking kernel in a decade at the ABI level. You dont sound stupid, which means youre being dishonest. Take it somewhere else before this gets okd school Linus.
I have no fucking patience when it comes to eirher Drepper and his goons or the useful idiots parroting that tripe at the expense of less technical people.
edit: I don't like losing my temper anywhere, especially in a community where I go way back. I'd like to clarify that I see this very much in terms of people with power (technical sophistication) and their relationship to people who are more vulnerable (those lacking that sophistication) in matters of extremely high stakes. The stakes at the low end are the cost and availability of computing. The high end is as much oppressive regime warrantless wiretap Gestapo shit as you want to think about.
Hackers have a responsibility to those less technical.
Obviously everything has some reason it was ever invented, and so there is a reason dynamic linking was invented too, and so congratulations, you have recited that reason.
A trivial and immediate counter example though is that a hacker is able to replace your awesome updated library just as easily with their own holed one, because it is loaded on the fly at run-time and the loading mechanism has lots of configurability and lots of attack surface. It actually enables attacks that wouldn't otherwise exist.
And a self contained object is inherently more portable than one with dependencies that might be either missing or incorrect at run time.
There is no simple single best idea for anything. There are various ideas with their various advantages and disadvantages, and you use whichever best services your priorities of the moment. The advantages of dynamic libs and the advantages of static both exist and sometimes you want one and sometimes you want the other.
The question is, do you want it to happen under your control in an organized way that produces fast, secure, portable artifacts, or do you want it to happen in some random way controlled by other people at some later date that will probably break or be insecure or both.
There's an analogy here to systems like `pip` and systems with solvers in them like `uv`: yeah, sometimes you can call `pip` repeatedly and get something that runs in that directory on that day. And neat, if you only have to run it once, fine.
But if you ship that, you're externalizing the costs to someone else, which is a dick move. `uv` tells you on the spot that there's no solution, and so you have to bump a version bound to get a "works here and everywhere and pretty much forever" guarantee that's respectful of other people.
There's is a new standard that is being developed by some industry experts that is aiming to address this called CPS. You can read the documentation on the website: https://cps-org.github.io/cps/ . There's a section with some examples as to why they are trying to fix and how.
Here's Bazel consuming it with zero problems, and if you have a nastier problem than a low-latency network system calling `liburing` on specific versions of the kernel built with Bazel? Stop playing.
The last thing we need is another failed standard further balkanizing an ecosystem that has worked fine if used correctly for 40+ years. I don't know what industry expert means, but I've done polyglot distributed builds at FAANG scale for a living, so my appeal to authority is as good as anyone's and I say `pkg-config` as a base for the vast majority of use cases with some special path for like, compiling `nginx` with it's zany extension mechanism is just fine.
https://gist.github.com/b7r6/316d18949ad508e15243ed4aa98c80d...
If pkg-config was never meant to be consumed directly, and was always meant to be post processed, then we are missing this post processing tool. Reinventing it in every compilation technology again and again is suboptimal, and at least Make and CMake do not have this post processing support.
binutils implemented this with `libdep`, it's just that it's done poorly. You can put a few flags like `-L /foo -lbar` in a file `__.LIBDEP` as part of your static library, and the linker will use this to resolve dependencies of static archives when linking (i.e. extend the link line). This is much like DT_RPATH and DT_NEEDED in shared libraries.
It's just that it feels a bit half-baked. With dynamic linking, symbols are resolved and dependencies recorded as you create the shared object. That's not the case when creating static libraries.
But even if tooling for static libraries with the equivalent of DT_RPATH and DT_NEEDED was improved, there are still the limitations of static archives mentioned in the article, in particular related to symbol visibility.
Do people really not know about `-ffunction-sections -fdata-sections` & `-Wl,--gc-sections` (doesn't require LTO)? Why is it used so little when doing statically-linked builds?
> Let’s say someone in our library designed the following logging module: (...)
Relying on static initialization order, and on runtime static initialization at all, is never a good idea IMHO
> they can be counter-productive
Rarely[1]. The only side effect this can have is the constant pools (for ldr rX, [pc, #off] kind of stuff) not being merged, but the negative impact is absolutely minimal (different functions usually use different constants after all!)
([1] assuming elf file format or elf+objcopy output)
There are many other upsides too: you can combine these options with -Wl,-wrap to e.g. prune exception symbols from already-compiled libraries and make the resulting binaries even smaller (depending on platform)
The question is, why are function-sections and data-sections not the default?
It is quite annoying to have to deal with static libs (including standard libraries themselves) that were compiled with neither these flags nor LTO.
We already had scripting engines for those languages in the 1990's, and the fact they are hardly available nowadays kind of tells of their commercial success, with exception of ROOT.
It's the first thing Google and LLMs 'tell' you when you ask about reducing binary size with static libraries. Also LTO does most of the same.
However, the semantics of inline are different between C and C++. To put it simply, C is restricted to static inline and, for variables, static const, whereas C++ has no such limitations (making them a superset); and static inline/const can sometimes lead to binary size bloat
...or build -flto for the 'modern' catch-all feature to eliminate any dead code.
...apart from that, none of the problems outlined in the blog post apply to header only libraries anyway since they are not distributed as precompiled binaries.
Do you know what you sound like?
https://github.com/kraj/musl/tree/kraj/master/src/stdio
...but these days -flto is simply the better option to get rid of unused code and data - and enable more optimizations on top. LTO is also exactly why static linking is strictly better than dynamic linking, unless dynamic linking is absolutely required (for instance at the operating system boundary).
The author proposes introducing a new kind of file that solves some of the problems with .a filed - but we already have a perfectly good compiled library format for shared libraries! So why can't we make gcc sufficiently smart to allow linking against those statically and drop this distinction?
Amusingly, other (even MSVC-compatible) toolchains never had such problem; e.g. Delphi could straight up link against a DLL you tell it to use.
[0] https://learn.microsoft.com/en-us/cpp/build/reference/using-...
Yes but like an artificially created remarkableness. "dynamic library" should just be "library", and then it's not remarkable at all.
It does seem obvious, and your Delphi example and the other comment wcc example shows, that if an executable can be assembled from .so at run time, then the same thing can also be done at any other time. All the pieces are just sitting there wondering why we're not using them.
personally I think leaving the binding of libraries to runtime opens up alot of room for problems, and maybe the savings of having a single copy of a library loaded into memory vs N specialized copies isn't important anymore either.
Why things that are solved in other programming ecosystems are impossible in c cpp world, like sane building system
Note that ISO C and ISO C++ ignore the existence of compilers, linkers and build tools, as per legalese there is some magic way how the code gets turned into machine code, the standards don't even consider the existence of filesystems on header files and translation units locations, they are talked about in the abstract, and can in all standard compliant way be stored in a SQL database.
This is such an ignorant comment.
Most other natively compiled languages have exactly the same concept behind: Object files, Shared Libraries, collection of object and some kind of configuration description of the compilation pipeline.
Even high level languages like Rust has that (to some extend).
The fact it is buried and hidden under 10 layers of abstraction and fancy tooling for your language does not mean it does not exist. Most languages currently do rely on the LLVM infrastructure (C++) for the linker and their object model anyway.
The fact you (probably) never had to manipulate it directly just mean your higher level superficial work never brought you deep enough where it starts to be a problem.
Did you just agree with me that other prog. ecosystems solved the building system challenge?
What should C do to solve it? Add another layer of abstraction on top of it? CMake does that and people complain about the extra complexity.
Putting the crap is a box with a user friendly handle on it to make it look 'friendlier' is never 'solving a problem'.
It is barely hiding the dust under the carpet.
Chesterton’s Fence yada yada?
1. We have libshared. It's got logging and other general stuff. libshared has static "Foo foo;" somewhere.
2. We link libshared into libfoo and libbar.
3. libfoo and libbar then go into application.
If you do this statically, what happens is that the Foo constructor gets invoked twice, once from libfoo and once from libbar. And also gets destroyed twice.
Is there something missing from .so files that wouldn’t allow them to be used as a basis for static linking? Ideally, you’d only distribute one version of the library that third parties can decide to either link statically or dynamically.
When a bunch of .o files are presented to the linker, it has to consider references in every direction. The last .o file could have references to the first one, and the reverse could be true.
This is not so for .a files. Every successive .a archive presented on the linker command line in left-to-right order is assumed to satisfy references only in material to the left of it. There cannot be circular dependencies among .a files and they have to be presented in topologically sorted order. If libfoo.a depends on libbar.a then libfoo.a must be first, then libbar.a.
(The GNU Linker has options to override this: you can demarcate a sequence of archives as a group in which mutual references are considered.)
This property of archives (or of the way they are treated by linking) is useful enough that at some point when the Linux kernel reached a certain size and complexity, its build was broken into archive files. This reduced the memory and time needed for linking it.
Before that, Linux was linked as a list of .o files, same as most programs.
relics are really old things that are revered and honored.
i think they just want archaic which are old things that are likely obsolete
Namely we should:
- make -l and -rpath options in
.a generation do something:
record that metadata in the .a
- make link-edits use that meta-
data recorded in .a files in
the previous item
I.e., start recording dependency metadata in .a files and / so we can stop flattening dependency trees onto the final link-edit.This will allow static linking to have the same symbol conflict resolution behaviors as dynamic linking.
(I bet that .a/.lib files were originally never really meant for software distribution, but only as intermediate file format between a compiler and linker, both running as part of the same build process)
If you have spontaneously called initialization functions as part of an initialization system, then you need to ensure that the symbols are referenced somehow. For instance, a linker script which puts them into a table that is in its own section. Some start-up code walks through the table and calls the functions.
This problem has been solved; take a look at how U-boot and similar projects do it.
This is not an archive problem because the linker will remove unused .o files even if you give it nothing but a list of .o files on the command line, no archives at all.
tux3•8h ago
I never really advertised it, but what it does is take all the objects inside your static library, and tells the linker to make a static library that contains a single merged object.
https://github.com/tux3/armerge
The huge advantage is that with a single object, everything works just like it would for a dynamic library. You can keep a set of public symbols and hide your private symbols, so you don't have pollution issues.
Objects that aren't needed by any public symbol (recursively) are discarded properly, so unlike --whole-archive you still get the size benefits of static linking.
And all your users don't need to handle anything new or to know about a new format, at the end of the day you still just ship a regular .a static library. It just happens to contain a single object.
I think the article's suggestion of a new ET_STAT is a good idea, actually. But in the meantime the closest to that is probably to use ET_REL, a single relocatable object in a traditional ar archive.
stabbles•7h ago
tux3•7h ago
However the ELF format does support complex symbol resolution, even for static objects. You can have weak and optional symbols, ELF interposition to override a symbol, and so forth.
But I feel like for most libraries it's best to keep it simple, unless you really need the complexity.
amluto•7h ago
For that matter, I’ve occasionally wondered if there’s any real reason you can’t statically link an ET_DYN (.so) file other than lack of linker support.
tux3•7h ago
I would also be very happy to have one less use of the legacy ar archive format. A little known fact is that this format is actually not standard at all, there's several variants floating around that are sometimes incompatible (Debian ar, BSD ar, GNU ar, ...)