There are a lot of things I don't like about C++, and close to the top of the list is the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time. Sepples is a royal pain in the ass to target for a dynamic FFI because of that. It would be really nice to have some way to get symbol names and calling semantics as constexpr const char* and not have to deal with generating (or writing) a ton of boilerplate and extern "C" blocks.
It's absolutely possible, but it's not low-hanging fruit so the standards committee will never put it in. Just like they'll never add a standardized equivalent for alloca/VLAs. We're not allowed to have basic, useful things. Only more ways to abuse type deduction. Will C++26 finally give us constexpr dynamic allocations? Will compilers ever actually implement one of the three (3) compile-time reflection standards? Stay tuned to find out!
they're not embedding LLVM - they're embedding clang. if you look at my comment below, you'll see LLVM is not currently sufficient.
> [C++] is a royal pain in the ass to target for a dynamic FFI because of that
name mangling is by the easiest part of cpp FFI - the hard part is the rest of the ABI. anyone curious can start here
They're embedding both, according to the article. But it's also just sloppy semantics on my part; when I say LLVM, I don't make a distinction of the frontend or any other part of it. I'm fully relying on context to include all relevant bits of software being used. In the same way I might use "Windows" to refer to any part of the Windows operating system like dwm.exe, explorer.exe, command.com, ps.exe, etc. LLVM a generic catch-all for me, I don't say "LLI" I say "the LLVM VM", for example. I can't really consider clang to be distinct from that ecosystem, though I know it's a discrete piece of software.
> name mangling is by the easiest part of cpp FFI
And it still requires a lot of work, and increases in effort when you have multiple compilers, and if you're on a tiny code team that's already understaffed, it's not really something you can worry about.
https://en.m.wikiversity.org/wiki/Visual_C%2B%2B_name_mangli...
You're right, writing platform specific code to handle this is more than possible. But it takes manhours that might just be better spent elsewhere. And that's before we get to the part where embedding a C++ compiler is extremely inappropriate when you just want a symbol name and an ABI.
But this is besides the point: The fact that it's not a problem solved by the gargantuan standard is awful. I also consider the ABI to be the exact same issue, that being absolutely awful support of runtime code loading, linking and interoperation. There's also no real reason for it, other than the standards committee being incompetent.
He compiled C with some builtins for syscalls, and then translated that to his own stack machine. But, he also had a target for native DLLs, so same safe syscall interface, but they can segv so you have to trust them.
Crazy to think that in one computer program (that still reads better than high-concept FAANG C++ from elite lehends, truly unique) this wasn't even the most dramatic innovation. It was the third* most dramatic revolution in one program.
If you're into this stuff, call in sick and read the plan files all day. Gives me googebumps.
Like many things, this isn't a C++ problem. There is a standard and almost every target uses it ... and then there's what Microsoft does. Only if you have to deal with the latter is there a problem.
Now, standards do evolve, and this does give room for different system libraries/tools to have a different view of what is acceptable/correct (I still have nightmares of trying to work through `I...E` vs `J...E` errors) ... but all the functionality does exist and work well if you aren't on the bleeding edge (fortunately, C++11 provided the bits that are truly essential; everything since has been merely nice-to-have).
The fact that the standard doesn't specify a name mangling scheme leads to the completely predictable result that different implementations use different name mangling schemes.
The fact that the standard doesn't specify a mechanism to mangle and demangle names (be it at runtime or at compile time) leads to the completely predictable result that different implementations provide different mechanisms to mangle and demangle names, and that some implementations don't provide such a mechanism.
These issues could, and should, have been fixed in the only place they can be fixed -- the standard. ISO is the mechanism through which different implementation vendors collaborate and find common solutions to problems.
When you have one implementations you have a standard. When you have two implementations and a standard you don’t actually have a standard in practice. You just have two implementations that kind of work similarly in most cases.
While the major compilers do a fantastic job they still frequently disagree about even “well defined” behavior because the standard was interpreted differently or different decisions were made.
This simply isn't true. Plenty of standardized things are interchangeable, from internet RFCs followed by zillions of players and implementations of various RFCs, medical device standards, encryption standards, weights and measures, currency codes, country codes, time zones, date and time formats, tons of file formats, compression standards, the ISO 9000 series, ASCII, testing standards, and on and on.
The poster above you is absolutely correct - if something is not in the standard, it can vary.
No standard stands alone in its own universe; complementary standards must necessarily always exist.
Besides, even if the C++ standard suddenly did incorporate ABI standards by reference, Microsoft would just refuse to follow them, and nothing would actually be improved.
I don't see the point of standardizing name mangling. Imagine there is a standard, now you need to standardize the memory layout of every single class found in the standard library. Without that, instead of failing at link-time, your hypothetical program would break in ugly ways while running because eg two functions that invoke one other have differing opinions about where exactly the length of a std::string can be found in the memory.
The real way, and the way befitting the role of the standards committee is actually putting effort into standardizing a way to talk to and understand the interfaces and structure of a C++ binary at load-time. That's exactly what linking is for. It should be the responsibility of the software using the FFI to move it's own code around and adjust it to conform with information provided by the main program as part of the dynamic linking/loading process... which is already what it's doing. You can mitigate a lot of the edge cases by making interaction outside of this standard interface as undefined behavior.
The canonical way to do your example is to get the address of std::string::length() and ask how to appropriately call it (to pass "this, for example.)
Clojure uses the JVM, jank uses LLVM. I imagine we'd need _something_ to handle the JIT runtime, as well as jank's compiler back-end (for IR optimization and target codegen). If it's not LLVM, jank would embed something else.
Having to build both of these things myself would make an already gargantuan project insurmountable.
Far from being standardized but it's possible today on GCC and Clang. You just abuse __PRETTY_FUNCTION__.
Any chance of Jank eventually settling on reference counting? It checks so many boxes in my book: Simple, predictable, few edge cases, fast. I guess it really just depends on how much jank programs thrash memory, I remember Clojure having a lot of background churn.
almostgotcaught•2h ago
https://github.com/Mr-Anyone/abi
or this if/when it comes to fruition
https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-...
to generate ABI compliant calls/etc for cpp libs.
note, i say all this with maximum love in my heart for a language that would have first class cpp interop - i would immediately become jank's biggest proponent/user if its cpp interop were robust.
EDIT: for people wanting/needing receipts, you can skim through https://github.com/compiler-research/CppInterOp/blob/main/li...
wk_end•2h ago
So, I agree that this sounds janky as heck. My question is: besides sounding janky as heck, is there something wrong with this? Is it slow/unreliable?
almostgotcaught•2h ago
refulgentis•2h ago
I'm a bit surprised I've seen two articles about jank here the last 2 days if these are exemplars of the technical approach and communication style. Seems like that wouldn't be enough to get on people's radars.
actionfromafar•2h ago
Jeaye•1h ago
refulgentis•1h ago
On re-read, I recognize where it is used in the article:
"jank is C++. There is no runtime reflection, no guess work, and no hints. If the compiler can't find a member, or a function, or a particular overload, you will get a compiler error."
I assume other interop scenarios don't pull this off*, thus it is distinctive. Additionally, I'm not at all familiar with Clojure, sadly, but it also sounds like there's some special qualities there ("I think that this is an interesting way to start thinking about jank, Clojure, and static types")
Now I'll riff and just write out the first 3-5 titles that come to mind with that limited understanding:
- Implementing compile-time verifiable C++ interop in jank
- Sparks of C++ interop: jank, Clojure, & verifying interop before runtime
- jank's progress on C++ interop
- Safe C++ interop lessons from jank
* for example, I write a lot of Dart day to day and rely on Dart's "FFI" implementation to call C++, which now that I'm thinking about, only works because there's a code generator that creates "Dart headers" (my term) for the C++ libraries. I could totally footgun and call arbitrary functions that don't exist.
Jeaye•1h ago
jank is written in C++. Its compiler and runtime are both in C++. jank can compile to C++ directly (or LLVM IR). jank can reach into C++ seamlessly, which includes reaching into its own compiler/runtime. Thus, the boundary between what is C++ and what is Clojure is gone, which leaves jank as being both Clojure and C++.
Achieving this singularity is a milestone for jank and, I think, is worthy of the title.
Jeaye•24m ago
This is misleading. Having done a great deal of both (as jank also supports C++ codegen as an alternative to IR), if the input is a fully analyzed AST, generating IR is significantly more error prone than generating C++. Why? Well, C++ is statically typed and one can enable warnings and errors for all sorts of issues. LLVM IR has a verifier, but it doesn't check that much. Handling references, pointers, closures, ABI issues, and so many more things ends up being a huge effort for IR.
For example, want to access the `foo.bar` member of a struct? In IR, you'll need to access foo, which may require loading it if it's a reference. You'll need to calculate the offset to `bar`, using GEP. You'll need to then determine if you're returning a reference to `bar` or if a copy is happening. Referencing will require storing a pointer, whereas copying may involve a lot more code. If we're generating C++, though, we just take `foo` and add a `.bar`. The C++ compiler handles the rest and will tell us if we messed anything up.
If you're going to hand wave and say anything that's building strings is error prone and unsafe, regardless of how richly typed and thoroughly analyzed the input is, the stance feels much less genuine.
Jeaye•2h ago
I completely agree that Clang could solve this by actually supporting my use case. Unfortunately, Clang is very much designed for standalone AOT compilation, not intertwined with another IR generating mechanism. Furthermore, Clang struggles to handle some errors gracefully which can get it into a bad state.
I have grown jank's fork of CppInterOp quite significantly, in the past quarter, with the full change list being here: https://gist.github.com/jeaye/f6517e52f1b2331d294caed70119f1... Hoping to get all of this upstreamed, but it's a lot of work that is not high priority for me right now.
I think, based on my experience in the guts of CppInterOp, that the largest issue is not the C++ code generation. Basically any code generation is some form of string building. You linked to a part of CppInterOp which is constructing C++ functions. What's _actually_ wrong with that, in terms of robustness? The strings are generated not based on arbitrary user input, but based on Clang QualTypes and Decls. i.e. you need valid Clang values to actually get there anyway. Given that the ABI situation is an absolute mess, and that jank is already using Clang's JIT C++ compiler, I think this is a very viable solution.
However, in terms of robustness, I go back to Clang's error handling, lack of grace, and poor tooling for use cases like this. Based on my experience, _that_ is what will cause robustness issues.
Please don't take my response as unreceptive or defensive. I really do appreciate the discussion and if I'm saying something wrong, or if you want to explain further, please do. For alternatives, you linked to https://github.com/Mr-Anyone/abi which is 3 months old and has 0 stars (and so I assume 0 users and 0 years of battle testing). You also linked to https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-... which I agree would be great, _if/when it becomes available_.
So, out of all of the options, I'll ask clearly and sincerely: is there really a _better_ option which exists today?
CppInterOp is an implementation detail of jank. If we can replace C++ string generation with more IR generation and a portable ABI mechanism, _and_ if Clang can provide the sufficient libraries to make it so that I don't need to rely on C++ strings to be certain that my template specializations get the correct instantiation, I am definitely open to replacing CppInterOp. From all I've seen, we're not there yet.
almostgotcaught•1h ago
ah my bad i meant to link to this one https://github.com/scrossuk/llvm-abi
which inspired the gsoc.
> is there really a _better_ option which exists today?
today the "best in class" approach is swift's which fully (well tries to) model cpp AST and do what i suggested (emitting code directly):
https://github.com/swiftlang/swift/blob/c09135b8f30c0cec8f5f...
Jeaye•1h ago
However, the huge downside to this approach, which cannot be overlooked, is that Clang (not libclang) is not designed to be a library. It doesn't have the backward compatibility of a library. Swift (i.e. Apple) is already deep into developing Clang, and so I'm sure they can afford the cost of keeping up with the breaking changes that happen on every Clang release. For a solo dev, I'm not yet sure this is actually viable, but I will give it more consideration.
However, I think that raising alarms at C++ codegen is unwarranted. As I said before, basically any query builder or codegen takes some form of string generation. The way we make those safe is to add types in front of them, so we're not just formatting user strings into other strings. That's exactly what CppInterOp does, where the types added are Clang QualTypes and Decls.
rjsw•1h ago