Making C and Python Talk to Each Other

https://leetarxiv.substack.com/p/making-c-and-python-talk-to-each

151•muragekibicho•3d ago

Comments

muragekibicho•3d ago

Lots of articles focus on Cython and optimizing Python using C code.

This article is about embedding Python scripts inside a C codebase

SandmanDP•1d ago

I’ve been curious, what are the motivations for most projects to use Lua for enabling scripting in C over this? Is the concern around including an entire Python interpreter in a project and Lua is lighter?

90s_dev•1d ago

Network effect.

bandoti•1d ago

Lua is much lighter but the key is that it’s probably one of the easiest things to integrate (just copy the sources/includes and add them to build it’ll work)—like a “header only” kind of vibe.

But, you can strip down a minimal Python build and statically compile it without too much difficulty.

I tend to prefer Tcl because it has what I feel the perfect amount of functionality by default with a relatively small size. Tcl also has the better C APIs of the bunch if you’re working more in C.

Lua is very “pushy” and “poppy” due to its stack-based approach, but that can be fun too if you enjoy programming RPN calculators haha :)

crote•1d ago

Lua is absolutely trivial to isolate. As the embedder, you have complete control over what the interpreter and VM are doing. Don't want your Lua scripts to have file access? Don't hook up those functions and you're done. Want to prevent against endless loops? Tell the VM to stop after 10.000 instructions. Want to limit the amount of memory a script can use? Absolutely trivial. This makes Lua very attractive for things like game development. You can run untrusted addon code without any worry that it'll be able to mess up the game - or the rest of the system.

Doing the same with Python is a lot harder. Python is designed first and foremost to run on its own. If you embed Python you are essentially running it besides your own code, with a bunch of hooks in both directions. Running hostile Python code? Probably not a good idea.

OskarS•1d ago

Another thing to mention is that until very recently (Python 3.12, I think?) every interpreter in the address space shared a lot of global state, including most importantly the GIL. For my area (audio plugins) that made Python a non-starter for embedding, while Lua works great.

I agree though: biggest reason is probably the C API. Lua's is so simple to embed and to integrate with your code-base compared to Python. The language is also optimized for "quick compiling", and it's also very lightweight.

These days, however, one might argue that you gain so much from embedding either Python or JavaScript, it might be worth the extra pain on the C/C++ side.

spacechild1•1d ago

People already mentioned that Lua is very lightweight and easy to integrate. It's also significantly faster than Python. (I'm not even talking about LuaJIT.)

Another big reason: the Lua interpreter does not have any global variables (and therefore also no GIL) so you can have multiple interpreters that are completely independent from each other.

Derbasti•18h ago

I've done both. Let me tell you, embedding Lua into a C program is magnitudes easier than embedding Python.

The main reason is the endless fragility of Python's garbage collector. To be clear, the API is trying to be as helpful as possible, but it's still a whole bunch of complexity that's easy to mess up. Incidentally, this discussion is left out in the linked article, which makes it less than useless. In my experience with many a third party C/Py interface, data leaks are incredibly common in such code.

Lua of course also has a garbage collector, but it essentially only knows two types of values: POD, and tables, with only the latter needing much consideration. The interaction model is based on a stack-based virtual machine, which is more complex than Python's full-function abstraction, but conveniently hides most of the garbage collector complexity. So long as you're just reshuffling things on the stack (i.e. most of the time), you don't need to worry about the garbage collector at all.

jebarker•1d ago

Lots of people argue that AI R&D is currently done in Python because of the benefits of the rich library ecosystem. This makes me realize that's actually a poor reason for everything to be in Python since the actually useful libraries for things like visualization could easily be called from lower level languages if they're off the hot path.

giancarlostoro•1d ago

I think its more than just because of the available libraries. I think that industry has just predominantly preferred Python. Python is a really rich modern language, it might be quirky, but so is every single language you can name. Nothing is quite as quirky as JavaScript though, maybe VB6 but that's mostly dead, though slightly lingering.

Mind you I've programmed in all the mentioned languages. ;)

whattheheckheck•1d ago

It's the ease of distribution of packages and big functionality being a pip install away

kstrauser•1d ago

That's the killer feature. Whatever it is you want to do, there's almost certainly a package for it. The joke is that Python's the second best language for everything. It's not the best for web backends, but it's pretty great. It's not the best for data analysis, but it's pretty great. It's not the best at security tooling, but it's pretty great. And it probably is the best language for doing all three of those things in one project.

scj•1d ago

Wouldn't it be nice if popular libraries could export to .so files so the best language for a task could use the bits & pieces it needed without a programmer needing to know python (and possibly C)?

Were I to write a scripting language, trivial export to .so files would be a primary design goal.

username223•1d ago

Unfortunately the calling conventions and memory models are all different, so there's usually hell to pay going between languages. Perl passes arguments on a stack, Lisp often uses tagged integers, Fortran stores matrices in the other order, ... it goes on and on. SWIG (https://swig.org) can help a lot, but it's still a pain.

pjc50•15h ago

Exporting to .so (a) makes it non-portable (you suddenly need to ship a whole compatibility matrix of .so files including a Windows DLL or several) and (b) severely constrains the language design. It's very hard to do this without either forcing the developer to do explicit heap management using the caller's heap or very carefully hiding your VM inside the shared object .. which has interesting implications once you have multiple such libraries. Also you don't have a predefined entry point (there's no equivalent of DllMain) so your caller is forced to manage that and any multithreading implications.

It basically forces your language to be very similar to C.

username223•1d ago

Hah!

Ruby, Python, and Perl all had similarly good package ecosystems in the late 1990s, and I think any of them could have ended up as the dominant scripting language. Then Google chose Python as its main scripting language, invested hundreds of millions of dollars, and here we are. It's not as suitable as Matlab, R, or Julia for numerical work, but money made it good enough.

(Sort of like how Java and later JavaScript VMs came to dominate: you can always compensate for poor upfront design with enough after-the-fact money.)

kjellsbells•1d ago

I think that gives Google too much credit (blame?). Perl, for example, started to become increasingly painful as the objects users wanted to manipulate outstripped the natural reach of the language (hence the infamous modem noise sigil pile up, @$[0]->\$foo@ etc). It also did not help that the Perl community took a ten year diversion into Perl6/Raku. Circa 2005, Python looked like a fresh drink compared to Perl.

kstrauser•1d ago

Yep. CPAN was impressive in the late 90s. I loved writing Perl at the time, other than the sigil explosion. The first time I wrote some Python (“OMG, everything is a reference?!”) was just about the last time I ever wrote any new Perl code.

I made that switch before I’d ever heard of Google, or Ruby for that matter. My experience was quite common at the time.

pzo•1d ago

> The joke is that Python's the second best language for everything.

not for everything. For mobile apps is still very poor - even if you plan only for prototyping instead of distribution. Same for frontend and desktop. For desktop you do have pyqt and pyside but I would say experience is not as good - you would still better do at least doing UI in QML) and end user distribution still sux.

I wish python mobile story improve. Python 3.13 try to improve support for android and iOS and beeware also working on it. But right now ecosystem of pip wheels that build for mobile is very minimal.

pjc50•15h ago

> That's the killer feature. Whatever it is you want to do, there's almost certainly a package for it.

Yes. Because C and C++ are never going to have a comparable package ecosystem, it almost makes sense for people to distribute such library projects as python packages simply because it handles all the packaging.

wallunit•1d ago

This is actually rather a reason to avoid Python in my opinion. You don't want pip to pollute your system with untracked files. There are tools like virtualenv to contain your Python dependencies but this isn't by default, and pip is generally rather primitive compared to npm.

montebicyclelo•1d ago

The industry standard has been Poetry for a good few years now, and UV is the newer exciting tool in this space. Both create universal lockfiles from more loosely specified dependencies in pyproject.toml resulting in reproducible environments across systems, (they create isolated Python environments per project).

fjasdfwa•1d ago

pip, pipx, pipenv, conda, setuptools, poetry, uv, pdm, easy_install, venv, virtualenv

I really hope we are at the end game with poetry or uv. I can't take it anymore.

giancarlostoro•1d ago

uv to me seems to be the next big one, pycharm already trying to integrate it, but it needs a lot more polish. Once the most used Python tools adopt uv it's pretty much game over. Course I always hope the industry adopts the best tool, but then they adopt the worst possible tools.

bee_rider•1d ago

Ubuntu complains now if you try to use pip outside a virtual environment… I think things are in a basically ok state as far as that goes.

Arguably it could be a little easier to automatically start up a virtual environment if you call pip outside of one… but, I dunno, default behavior that papers over too many errors is not great. If they don’t get a hard error, confused users might become even more confused when they don’t learn they need to load a virtual environment to get things working.

kristjansson•1d ago

One ... could? But it doesn't seem particularly ergonomic.

jebarker•1d ago

Ergonomics isn't the point, performance is.

ashishb•1d ago

I rewrote a simple RAG ingestion pipeline from Python to Go.

It reads from a database. Generates embeddings. Writes it to a vector database.

  - ~10X faster
  - ~10X lower memory usage

The only problem is that you have to spend a lot of time figuring out how to do it.

All instructions on the Internet and even on the vector database documentation are in Python.

chpatrick•1d ago

If speed and memory use aren't a bottleneck then "a lot of time figuring out how to do it" is probably the biggest cost for the company. Generally these things can be run offline and memory is fairly cheap. You can get a month of a machine with a ton of RAM for the equivalent of one hour of developer time of someone who knows how to do this. That's why Python is so popular.

kgeist•1d ago

>I rewrote a simple RAG ingestion pipeline from Python to Go

I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.

For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

ashishb•1d ago

> I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.

crote•1d ago

> could easily be called from lower level languages

Could? Yes. Easily? No.

People write their business logic in Python because they don't want to code in those lower-level languages unless they absolutely have to. The article neatly shows the kind of additional coding overhead you'd have to deal with - and you're not getting anything back in return.

Python is successful because it's a high-level language which has the right tooling to create easy-to-use wrappers around low-level high-performance libraries. You get all the benefits of a rich high-level language for the cold path, and you only pay a small penalty over using a low-level language for the hot path.

jebarker•1d ago

The problem I see (day to day working on ML framework optimization) is that it's not just a case of python calling lower level compiled code. Pytorch, for example, has a much closer integration of python and the low level functions than that and it does cause performance bottlenecks. So in theory I agree that using high level languages to script calls to low level is a good idea, but in practice that gets abused to put python in the hot path. Perhaps if the lower level language were the bulk of the framework and just called python for helper functions we'd see better performance-aware design from developers.

yowlingcat•1d ago

> but in practice that gets abused to put python in the hot path

But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?

jebarker•1d ago

The problem is that python allows people to be lazy and ignore subtle performance issues. That's much harder in a lower level language. Obviously the tradeoff is that it'd slow down (or completely stop) some developers. I'm really just wondering out loud if the constraints of a lower level language would help people write better code in this case and whether that trade-off would be worth it

sigbottle•1d ago

What's the bottleneck? Is it serializing to/from pyobjects over and over for the mlops? I thought pytorch was pretty good with this: Tensors are views, the computation graph can be executed in parallel, & you're just calling a bunch of fast linear algebra libraries under the hood, etc.

If it avoids excessive copying & supports parallel computation, surely it's fine?

If your model is small enough where the overhead of python would start dominating the execution time, I mean... does performance even matter that much, then? And if it's large enough, surely the things I mentioned outweigh the costs?

imtringued•20h ago

Pytorch started off with an eager execution model. This means that for every kernel you call from python, you have to wait for the kernel to finish and then go back to python to launch the next kernel. torch.compile was introduced to avoid this bottleneck.

jebarker•15h ago

Yep, this is one issue. There are lots of limitations to what you can compile in this way though and your python code rapidly resembles a lower level language and not just scripting. There are also overheads associated with handling distributed collectives from python, multiprocessing for data loader workers in python and also baked in assumptions in the lower level libraries that introduce overhead if you can't go in and fix them yourself (in which case you could be coding in C++ anyway)

sigbottle•3h ago

> your python code rapidly resembles a lower level language and not just scripting

I thought the point of numeric processing frameworks&languages in general is that if you can express things as common math equations, then geniuses will go in and implement the hyper-optimal solutions for you because the'yre extremely common. If anything, it should resemble scripting even more, because you want to match the structured way as much as possible, so the 'compiler' (or in this case backend C libraries) can do the lifting for you.

sigbottle•3h ago

Ah, I always forget that there's intermediates that aren't just matrix multiplies in ML.

A single python interpreter stack frame into a 10^4 * 10^4 GEMM C BLAS kernel is not a bottleneck, but calling 10^8 python interpreter stack frames for a pointwise addition broadcast op would be a bottleneck.

Does pytorch overload common broadcast operations though? I was under the impression that it did as well. I guess this is what `torch.compile` attempts to solve?

efavdb•1d ago

FWIW I would be up to write in c or something else, but use python for the packages / network effects.

mkoubaa•1d ago

Nobody has ever, in the history of Python, called the Python C API easy.

bigger_cheese•1d ago

I have been using Python recently and have found a lot of the data visualization tools seem to be wrappers around other languages (mostly JavaScript), things like, agGrid, Tabulator, Plotly etc.

Sometimes you end up embedding chunks of javascript directly inside your python

For example the docs for Streamlit implementation of AgGrid contain this: https://staggrid-examples.streamlit.app/Advanced_config_and_...

Grimblewald•1d ago

It's not just about library availability. Python wins because it lets you offload the low-level performance work to people who really know what they’re doing. Libraries like NumPy or PyTorch / Keras wrap highly optimized C/C++ code—so you get near-C/++ performance without having to write or debug C yourself, and without needing a whole computer science degree to do so properly.

It's a mistake to assume C is always faster. If you don’t have a deep understanding of memory layout, compiler flags, vectorization, cache behavior, etc. your hand-written C code can easily be slower than high-level Python using well-optimized libraries. See [1] for a good example of that.

Sure, you could call those same libs from C, but then you're reinventing Python's ecosystem with more effort and more chances to shoot yourself in the foot. Python gives you access to powerful, low-level tools while letting you focus on higher-level problems—in a language that’s vastly easier to learn and use.

That tradeoff isn't just convenience—it's what makes modern AI R&D productive at scale.

[1] https://stackoverflow.com/questions/41365723/why-is-my-pytho...

jebarker•15h ago

I feel like you're re-stating the same claim that crote made that there's a clean cut between python and lower level libraries meaning that the user doesn't need to know what is happening at the lower level to achieve good performance. This is not true in many cases if you are aiming to achieve peak performance - which we should be for training and serving AI systems since they are already so resource hungry.

a_t48•1d ago

Useful, I’m going to be doing something similar w/C++ soon.

kvemkon•1d ago

Once I needed to implement a simple python plugin engine in a C/C++ software, I've been successfully using the official guide [1].

[1] https://docs.python.org/3/extending/embedding.html

eth_hack77•1d ago

Thanks a lot for the article. Here's a QQ: did you measure the time of some basic operations python vs C? (e.g. if I do a loop of 10 billion iterations, just dividing numbers in C and do the same in python, and then import these operations into one another as libraries, does anything change?)

I'm a beginner engineer so please don't judge me if my question is not making perfect sense.

bdbenton5255•1d ago

C is many magnitudes faster than Python and you can measure this using nested conditionals. Python is built for a higher level of abstraction and this comes at the cost of speed. It is what makes it very natural and human-like to write in.

xandrius•1d ago

Syntax has nothing to do with the speed of the language: python could be "natural" and "human-like" while being much faster and also "unnatural" and "inhuman" while being slower.

throwaway314155•1d ago

Language abstractions that are not "zero-cost" inevitably lead to worse performance. Python has many such abstractions designed to improve developer experience. I think that's all the person you're responding to meant.

bdbenton5255•1d ago

Yes, thank you.

vlovich123•1d ago

It’s not mainly the syntax though although it has a marginal effect. It’s partially the lack of typing information (which is syntax) but mostly that it runs interpreted. Pypy is significantly faster because it applies JIT to generate machine code directly to represent the Python code and it’s significantly faster in most cases. Another huge cost is in multi-threaded scenarios where Python has the GIL (even in single threaded there’s a cost) which is an architectural and not syntactic decision.

For example, Python has a drastically simpler syntax in some ways than C++ (ignoring the borrow annotations). In many ways it can look like Python code. Yet its performance is the same as c++ because it’s AOT compiled and has an explicit type system to support that.

TLDR: most of python’s slowness is not syntactic but architectural design decisions about how to run the code which is why alternate Python implementations (IronPython, PyPy) typically run faster.

pjmlp•22h ago

Which is proven wrong by optimizing Common Lisp compilers wiht similar abstractions.

Or the SELF workstation environment at Sun, whose research ideas in optmizing JIT compilers eventually landed on V8.

bdbenton5255•1d ago

It does, actually, as the syntax is a result of the language's design and a simpler and more human-like syntax requires a higher level of abstraction that reduces efficiency.

The design of a language, including its syntax, has a great bearing on its speed and efficiency.

Compare C with Assembly, for example, and you will see that higher level languages take complex actions and simplify them into a more terse syntax.

You will also observe that languages such as Python are not nearly as suitable for lower level tasks like writing operating systems where C is much more suitable due to speed.

Languages like Python and Ruby include a higher level of built-in logic to make writing in them more natural and easy at the cost of efficiency.

johannes1234321•1d ago

Then let's look at C++, which in some areas has a higher abstraction level than C, but in some areas can still be faster than C. (Due to usage of templates, which then inline the library code, which then can be optimized on actual types, rather than using library functions which use void pointers, which will require a function call and have a not as optimized compiled form.

The main thing about python being slower is that in most contexts it is used as an interpreted/interpiled language running on its own VM in cpython.

ryao•1d ago

What you wrote about C versus C++ is largely untrue. C++ is not faster than C, even when “using library functions which use void pointers”. There is nothing about a void pointer that prevents inlining in either C or C++. There is also no “optimized on actual types” for C++ and not C, since everything is compiled into a low level intermediate language in the compiler (typically three-address code). All of the C++ types are absent at that point. The low level intermediate representation is then what receives optimization.

For example, GCC will outright inline both bsearch() and the provided comparator in cases where it can see the definition of the comparator, such that there are no function calls done to either bsearch() or the comparator. C compilers will do this for a number of standard library functions and even will do it for non-library functions in the same file since they will inline small functions if allowed to do inlining. When the functions are not in the same file, you need LTO to get the inlining, but the same applies to C++.

That said, I have never seen assembly output from a C++ compiler for C++ using C++ specific language features that was more succinct than equivalent C. I am sure it happens, but the C++ language is just full of bloat. C++ templates usually result in more code than less, since the compiler must work much harder to optimize the result and opportunities are often lost. It is also incredibly easy for overhead to be hiding in the abstractions, especially if you have a thread safe class used in a single threaded context as part of a larger multithreaded program. The compiler will not optimize the thread safety overhead away. You might not believe that C++ language features add bloat, so I will leave you with this tidbit:

https://twitter.com/TimSweeneyEpic/status/122307740466037145...

Tim Sweeney’s team had C++ code that not only did not use exceptions, but explicitly opted out of them with noexcept. They got a 15% performance boost from turning off exceptions in the compiler, for no apparent reason. C, not having exceptions, does not have this phantom overhead.

AlotOfReading•1d ago

Noexcept doesn't mean "this function doesn't use exceptions", it means "this function doesn't throw exceptions". The difference being that a child function can throw, but std::terminate will be called once a noexcept function is unwound. There's no standard way to specify the former, only compiler flags.

C++ can be used to write code that generates assembly equivalent to pretty much any C. A lot of standards committee work goes into ensuring that's possible. The trade-off is that it's the closest thing humans have ever produced to a lovecraftian programming language.

uecker•1d ago

Yes, but you can not achieve faster code in C++ than you can also achieve in C and the use of templates or dynamic dispatch certainly can come with a cost. I would also argue than you can write similar abstractions also in C with very similar trade-offs. The difference is mostly that C has less syntactic sugar but everything is more obvious.

AlotOfReading•22h ago

I'd love to see any examples you have of compile time metaprogramming libraries like Eigen or CTRE written instead in C. You can do a little of that with _Generic, but I'd generally prefer the nightmare that is templates to most of the hardcore macro magic I've encountered (e.g. Boost.PP), let alone constexpr.

uecker•15h ago

I think this is asking the wrong question. In many case it would be smarter to implement these algorithms using high-level abstractions and then let the optimizer specialize it again. This works very well also in C:https://godbolt.org/z/bohvffd7r I use it a lot, but I am not aware about a public project similar to Eigen. I definitely convinced this could be done and would be very nice. One downside is that one might want to have more precise control. But even then there are solutions which IMHO are better than template metaprogramming.

AlotOfReading•13h ago

That's what Eigen does. You write the high level statement and it does template magic to convert that into an optimized series of BLAS calls, even omitting or combining calls (something impossible to do with just _Generic). CTRE does something similar. The parsing all happens at compile time, so code is only paying the cost of matching (which benefits from all the standard compiler optimizations). There's a platonically ideal compiler somewhere out there that could do both of these jobs too, but compilers are difficult enough and need to run fast enough that implementing every possible optimization in every domain isn't going to happen.

uecker•12h ago

I know what Eigen does. The point I tried to make is that you can let the optimizer specialize the code instead of a template engine and this is much cleaner. If you want to do arbitrary transformations, you can just run a program at compile-time. This is still much nicer than have template code and even more powerful.

ryao•8h ago

If every function is marked noexcept, does it make a difference? Either way, the point of C++ exceptions was to make the fast path even faster by moving error handling out of it. Since they had no idea what was wrong, the code evidently was running in the fast path since it was not terminating by throwing exceptions, yet it ran slower merely because of the C++ exception support.

In any case, my point is that C++ features often carry overhead that simply does not exist in C. Being able to get C++ code to run as fast as C code (possibly by not using any C++ features) does not change that.

pjmlp•22h ago

Yet all your descriptions have nothing to do with ISO C, nor ISO C++, rather quality of implementation in the GCC compiler toolchain.

ryao•7h ago

The other guy’s remarks were based on the behavior of ancient compilers. I was describing current ones. In any case, my remarks mostly apply to LLVM too. GCC and LLVM are the only compilers that matter these days.

Intel replaced ICC with a LLVM fork and Microsoft’s compiler is used by only a subset of Windows’ developers. There are few other compilers in widespread use for C and C++. I believe ARM has a compiler, but Linaro has made it such that practically nobody uses it. Pathscale threw in the towel several years ago too. There is the Compcert C compiler, but it is used in only niche applications. I could probably name a few others if I tried, but they are progressively more niche as I continue.

imtringued•20h ago

Syntax is utterly irrelevant. It's the most irrelevant thing you could be focusing on.

JAX and Triton compile your python code to incredibly fast GPU kernels. If you want to run pure python, then there are JIT based runtimes like Jython or PyPy that run your code faster.

What it boils down to is the fact that CPython is an incredibly slow runtime and CPython dominates due to interoperability with C extensions.

I don't know why, but I've seen a lot of people act as if the C language is some kind of voodoo thing as if C being fast is due to mere superstition. "Everyone knows C is the fastest, therefore C is the fastest" What you're doing is the equivalent of reading tea leaves or horoscopes or being an audiophile.

rossant•1d ago

My visualization library [1] is written in C and exposes a visualization API in C. It is packaged as a Python wheel using auto-generated ctypes bindings, which includes the shared library (so, dylib, or dll) and a few dependencies. This setup works very well, with no need to compile against each Python version. I only need to build it for the supported platforms, which is handled automatically by GitHub Actions. The library is designed to minimize the number of C calls, making the ctypes overhead negligible in practice.

[1] https://datoviz.org/

Grimblewald•1d ago

This looks awesome, thanks for sharing - do you have any info on how it compares to matplotlib in terms of plotting speed? (even just rough estimates)

fadesibert•1d ago

A quick inspection of the article suggests there's a difference of intent.

<snip> Datoviz is a relatively low-level visualization library. It focuses on rendering visual primitives like points, lines, images, and meshes — efficiently and interactively.

Unlike libraries such as Matplotlib, Datoviz does not provide high-level plotting functions like plt.plot(), plt.scatter(), or plt.imshow(). Its goal is not to replace plotting libraries, but to serve as a powerful rendering backend for scientific graphics. </snip>

rossant•23h ago

Yes, although there isn't much to do to go from Datoviz to simple scientific plots like scatter plots or polylines with axes. It's just a few lines of code. I should probably clarify the documentation.

rossant•1d ago

Good idea, I should do benchmarks on simple plots. It's orders of magnitude faster. Above 60 FPS on hundreds of thousands to even millions of points depending on the type of plot, on common hardware.

Grimblewald•22h ago

Thanks, I expected as much but wanted to confirm before I commit to learning something new, not for laziness but for prioritising what to learn next. Project looks great either way!

lgtx•21h ago

How does it compare to VTK, that also provides a Python API?

rossant•19h ago

Datoviz is much lighter and younger than VTK. It compiles in seconds, making it much faster to build, and it's easier to install and distribute.

It has far fewer features since it focuses solely on rendering common primitives. For example, it doesn't handle data file loading, except for a few quick loaders like OBJ used mainly for testing.

There's almost no computational geometry, data processing, or signal processing functionality. Datoviz is solely focused on interactive rendering.

Datoviz also supports fast, high-quality 2D vector graphics, scaling efficiently to millions of points. In contrast, VTK is primarily designed for more complex and heavy 3D rendering tasks.

brcmthrowaway•1d ago

How does this compare to pybind11?

dgrunwald•20h ago

The article is using the Python C API directly.

pybind11 is a C++ wrapper that makes the Python API more friendly to use from C++ (e.g. smart pointers instead of manual reference counting)

nubinetwork•1d ago

Isn't this the whole point to cffi and cython?

softwaredoug•1d ago

Definitely though Cython is a layer of abstraction that might feel like Python has all kinds of weirdness you might as well write in a better understood language like C.

pjmlp•22h ago

As someone that routinely kind of talks C down, yet knows it well enough since around 1990, I will acknowledge that for many in the current generation entering the workforce that is definetly not a given, especially the recent trend to treat C as if it was a scripting language, tryingt to avoid learning about the whole compiler toolchain with stuff like header only libraries.

HexDecOctBin•21h ago

> avoid learning about the whole compiler toolchain with stuff like header only libraries.

Compromises needed to compile code on PDP-11, etc. are not some sacred immutable facts about how toolchain ought to work. One could learn everything about the crufty compiler toolchains and decide that it's all just a layering of over-abstracted nonsense over obsolete nonsense.

pjmlp•19h ago

Compiler toolchains are hardly compromises needed to compile code on a PDP-11.

I do agree that UNIX compilation model is not the best example around, though.

Nonetheless, most languages on UNIX like OSes will for the most part follow the same model, as they need to fit into the ecosystem, as otherwise there is always an impedance mismatch that will prevent their adoption in some cases.

That is how you get modern languages like any wanabee C or C++ replacement using the same compiler toolchains and linkers, instead of compiler based build and linker systems, as languages not born into the UNIX culture.

Thus pretending systems programming languages are like scripting languages will hardly help.

dexzod•1d ago

The title of the article is misleading. Making C and python talk to each other implies, calling python from C and calling C from python. The article only covers the former.

hughw•1d ago

I realize I'm talking about C++ not C, but coincidentally just today I ported our 7 year old library's Swig/Python interface to nanobind. What a fragile c9k Swig has been all these years (don't touch it!) and the nanobind transformation is so refreshing and clean, and lots of type information suddenly available to Python programs. One day of effort and our tests all pass, and now nanobind seems able to allow us to improve the ergonomics (from the Python pov) of our lib.

rrdharan•16h ago

What is c9k short for?

spacechild1•15h ago

I guess it's "clusterfuck" :)

DonHopkins•12h ago

A c9k problem is much worse than a y2k problem (yuck).

hugs•1d ago

This is one of the "killer apps" for Nim. Nim makes makes it easy to wrap C and easy to talk to Python (via Nimpy).

nottorp•21h ago

Or this?

https://docs.python.org/3/extending/embedding.html

vkoskiv•12h ago

I did a lot of this for my raytracer, c-ray [1]. Originally it was just a self-contained C program, but I got tired of writing buggy and limited asset import/export code, so eventually I put together a minimal public C API [2] that I then wrapped with CPython bindings [3] and some additional python code [4] to expose a more 'pythonic' API. It's all still a WIP, but it has already allowed me to write a Blender plugin [5], so now I can play around with my renderer directly in Blender, and test with more complex scenes others have made.

Fun project, and it's really cool to see my little renderer in the interactive viewport in Blender, but I have also learned that I don't particularly enjoy working with non-trivial amounts of Python code.

[1] https://github.com/vkoskiv/c-ray [2] https://github.com/vkoskiv/c-ray/blob/51a742b2ee4d0b570975cd... [3] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd... [4] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd... [5] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd...

UncleEntity•8h ago

Just a little nitpick but using Py_BuildValue is much better to generate PyObjects from C if you are doing more than one of them instead of populating a list and converting it to a tuple to pass to the python function.

My general rule (when doing the opposite of passing C values to python as part of an extension) is to use the dedicated function, like PyLong_FromLong, when there is a single return and Py_BuildValue (with its format string for automagic conversion) when the function returns a tuple.

Oh, and if you are checking your return values from python (you are, right?) using Py_XDECREF isn't all that good of an idea since it will mask some flawed logic. I pretty much only use it when a PyObject pointer could validly be NULL (like with an optional function argument) and I'm cleaning up right before throwing an exception. Tracking python reference counts is a whole blog post in itself since some functions steal references and others don't and if you get it wrong you can easily crash the interpreter and/or create memory leaks.

Photos taken inside musical instruments

Valkey Turns One: Community fork of Redis

Surprisingly fast AI-generated kernels we didn't mean to publish yet

Mary Meeker's first Trends report since 2019, focused on AI

Reverse engineering of Linear's sync engine

Beating Google's kernelCTF PoW using AVX512

The ‘white-collar bloodbath’ is all part of the AI hype machine

Show HN: Icepi Zero – The FPGA Raspberry Pi Zero Equivalent

Show HN: MCP Defender – OSS AI Firewall for Protecting MCP in Cursor/Claude etc.

How large should your sample size be?

Microsandbox: Virtual Machines that feel and perform like containers

Revenge of the Chickenized Reverse-Centaurs

Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory

Systems Correctness Practices at Amazon Web Services

Every 5x5 Nonogram

Silicon Valley finally has a big electronics retailer again: Micro Center opens

Ray Tracing in J

Anthropic launches a voice mode for Claude

Show HN: Circle Crop Image

The Darwin Gödel Machine: AI that improves itself by rewriting its own code

Jerry Lewis's “The Day the Clown Cried” discovered in Sweden after 53 years

StackAI (YC W23) Is Looking for SWR and Tailwind Wizards

How to run cron jobs in Postgres without extra infrastructure

Show HN: Leap – Full-stack AI developer agent that deploys to AWS

Adam Riess and the Hubble tension

Show HN: Smart Silence – Remind your iPhone to stay quiet in quiet places

Copy Excel to Markdown Table (and vice versa)

De Bruijn notation, and why it's useful

A Smiling Public Man

Show HN: W++ – A Python-style scripting language for .NET with NuGet support

Photos taken inside musical instruments

Valkey Turns One: Community fork of Redis

Surprisingly fast AI-generated kernels we didn't mean to publish yet

Mary Meeker's first Trends report since 2019, focused on AI

Reverse engineering of Linear's sync engine

Beating Google's kernelCTF PoW using AVX512

The ‘white-collar bloodbath’ is all part of the AI hype machine

Show HN: Icepi Zero – The FPGA Raspberry Pi Zero Equivalent

Show HN: MCP Defender – OSS AI Firewall for Protecting MCP in Cursor/Claude etc.

How large should your sample size be?

Microsandbox: Virtual Machines that feel and perform like containers

Revenge of the Chickenized Reverse-Centaurs

Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory

Systems Correctness Practices at Amazon Web Services

Every 5x5 Nonogram

Silicon Valley finally has a big electronics retailer again: Micro Center opens

Ray Tracing in J

Anthropic launches a voice mode for Claude

Show HN: Circle Crop Image

The Darwin Gödel Machine: AI that improves itself by rewriting its own code

Jerry Lewis's “The Day the Clown Cried” discovered in Sweden after 53 years

StackAI (YC W23) Is Looking for SWR and Tailwind Wizards

How to run cron jobs in Postgres without extra infrastructure

Show HN: Leap – Full-stack AI developer agent that deploys to AWS

Adam Riess and the Hubble tension

Show HN: Smart Silence – Remind your iPhone to stay quiet in quiet places

Copy Excel to Markdown Table (and vice versa)

De Bruijn notation, and why it's useful

A Smiling Public Man

Show HN: W++ – A Python-style scripting language for .NET with NuGet support

Making C and Python Talk to Each Other

Comments