But, you can strip down a minimal Python build and statically compile it without too much difficulty.
I tend to prefer Tcl because it has what I feel the perfect amount of functionality by default with a relatively small size. Tcl also has the better C APIs of the bunch if you’re working more in C.
Lua is very “pushy” and “poppy” due to its stack-based approach, but that can be fun too if you enjoy programming RPN calculators haha :)
Doing the same with Python is a lot harder. Python is designed first and foremost to run on its own. If you embed Python you are essentially running it besides your own code, with a bunch of hooks in both directions. Running hostile Python code? Probably not a good idea.
I agree though: biggest reason is probably the C API. Lua's is so simple to embed and to integrate with your code-base compared to Python. The language is also optimized for "quick compiling", and it's also very lightweight.
These days, however, one might argue that you gain so much from embedding either Python or JavaScript, it might be worth the extra pain on the C/C++ side.
Another big reason: the Lua interpreter does not have any global variables (and therefore also no GIL) so you can have multiple interpreters that are completely independent from each other.
The main reason is the endless fragility of Python's garbage collector. To be clear, the API is trying to be as helpful as possible, but it's still a whole bunch of complexity that's easy to mess up. Incidentally, this discussion is left out in the linked article, which makes it less than useless. In my experience with many a third party C/Py interface, data leaks are incredibly common in such code.
Lua of course also has a garbage collector, but it essentially only knows two types of values: POD, and tables, with only the latter needing much consideration. The interaction model is based on a stack-based virtual machine, which is more complex than Python's full-function abstraction, but conveniently hides most of the garbage collector complexity. So long as you're just reshuffling things on the stack (i.e. most of the time), you don't need to worry about the garbage collector at all.
Mind you I've programmed in all the mentioned languages. ;)
Were I to write a scripting language, trivial export to .so files would be a primary design goal.
It basically forces your language to be very similar to C.
Ruby, Python, and Perl all had similarly good package ecosystems in the late 1990s, and I think any of them could have ended up as the dominant scripting language. Then Google chose Python as its main scripting language, invested hundreds of millions of dollars, and here we are. It's not as suitable as Matlab, R, or Julia for numerical work, but money made it good enough.
(Sort of like how Java and later JavaScript VMs came to dominate: you can always compensate for poor upfront design with enough after-the-fact money.)
I made that switch before I’d ever heard of Google, or Ruby for that matter. My experience was quite common at the time.
not for everything. For mobile apps is still very poor - even if you plan only for prototyping instead of distribution. Same for frontend and desktop. For desktop you do have pyqt and pyside but I would say experience is not as good - you would still better do at least doing UI in QML) and end user distribution still sux.
I wish python mobile story improve. Python 3.13 try to improve support for android and iOS and beeware also working on it. But right now ecosystem of pip wheels that build for mobile is very minimal.
Yes. Because C and C++ are never going to have a comparable package ecosystem, it almost makes sense for people to distribute such library projects as python packages simply because it handles all the packaging.
I really hope we are at the end game with poetry or uv. I can't take it anymore.
Arguably it could be a little easier to automatically start up a virtual environment if you call pip outside of one… but, I dunno, default behavior that papers over too many errors is not great. If they don’t get a hard error, confused users might become even more confused when they don’t learn they need to load a virtual environment to get things working.
It reads from a database. Generates embeddings. Writes it to a vector database.
- ~10X faster
- ~10X lower memory usage
The only problem is that you have to spend a lot of time figuring out how to do it.All instructions on the Internet and even on the vector database documentation are in Python.
I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.
For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too
There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.
Could? Yes. Easily? No.
People write their business logic in Python because they don't want to code in those lower-level languages unless they absolutely have to. The article neatly shows the kind of additional coding overhead you'd have to deal with - and you're not getting anything back in return.
Python is successful because it's a high-level language which has the right tooling to create easy-to-use wrappers around low-level high-performance libraries. You get all the benefits of a rich high-level language for the cold path, and you only pay a small penalty over using a low-level language for the hot path.
But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?
If it avoids excessive copying & supports parallel computation, surely it's fine?
If your model is small enough where the overhead of python would start dominating the execution time, I mean... does performance even matter that much, then? And if it's large enough, surely the things I mentioned outweigh the costs?
I thought the point of numeric processing frameworks&languages in general is that if you can express things as common math equations, then geniuses will go in and implement the hyper-optimal solutions for you because the'yre extremely common. If anything, it should resemble scripting even more, because you want to match the structured way as much as possible, so the 'compiler' (or in this case backend C libraries) can do the lifting for you.
A single python interpreter stack frame into a 10^4 * 10^4 GEMM C BLAS kernel is not a bottleneck, but calling 10^8 python interpreter stack frames for a pointwise addition broadcast op would be a bottleneck.
Does pytorch overload common broadcast operations though? I was under the impression that it did as well. I guess this is what `torch.compile` attempts to solve?
Sometimes you end up embedding chunks of javascript directly inside your python
For example the docs for Streamlit implementation of AgGrid contain this: https://staggrid-examples.streamlit.app/Advanced_config_and_...
It's a mistake to assume C is always faster. If you don’t have a deep understanding of memory layout, compiler flags, vectorization, cache behavior, etc. your hand-written C code can easily be slower than high-level Python using well-optimized libraries. See [1] for a good example of that.
Sure, you could call those same libs from C, but then you're reinventing Python's ecosystem with more effort and more chances to shoot yourself in the foot. Python gives you access to powerful, low-level tools while letting you focus on higher-level problems—in a language that’s vastly easier to learn and use.
That tradeoff isn't just convenience—it's what makes modern AI R&D productive at scale.
[1] https://stackoverflow.com/questions/41365723/why-is-my-pytho...
I'm a beginner engineer so please don't judge me if my question is not making perfect sense.
For example, Python has a drastically simpler syntax in some ways than C++ (ignoring the borrow annotations). In many ways it can look like Python code. Yet its performance is the same as c++ because it’s AOT compiled and has an explicit type system to support that.
TLDR: most of python’s slowness is not syntactic but architectural design decisions about how to run the code which is why alternate Python implementations (IronPython, PyPy) typically run faster.
Or the SELF workstation environment at Sun, whose research ideas in optmizing JIT compilers eventually landed on V8.
The design of a language, including its syntax, has a great bearing on its speed and efficiency.
Compare C with Assembly, for example, and you will see that higher level languages take complex actions and simplify them into a more terse syntax.
You will also observe that languages such as Python are not nearly as suitable for lower level tasks like writing operating systems where C is much more suitable due to speed.
Languages like Python and Ruby include a higher level of built-in logic to make writing in them more natural and easy at the cost of efficiency.
The main thing about python being slower is that in most contexts it is used as an interpreted/interpiled language running on its own VM in cpython.
For example, GCC will outright inline both bsearch() and the provided comparator in cases where it can see the definition of the comparator, such that there are no function calls done to either bsearch() or the comparator. C compilers will do this for a number of standard library functions and even will do it for non-library functions in the same file since they will inline small functions if allowed to do inlining. When the functions are not in the same file, you need LTO to get the inlining, but the same applies to C++.
That said, I have never seen assembly output from a C++ compiler for C++ using C++ specific language features that was more succinct than equivalent C. I am sure it happens, but the C++ language is just full of bloat. C++ templates usually result in more code than less, since the compiler must work much harder to optimize the result and opportunities are often lost. It is also incredibly easy for overhead to be hiding in the abstractions, especially if you have a thread safe class used in a single threaded context as part of a larger multithreaded program. The compiler will not optimize the thread safety overhead away. You might not believe that C++ language features add bloat, so I will leave you with this tidbit:
https://twitter.com/TimSweeneyEpic/status/122307740466037145...
Tim Sweeney’s team had C++ code that not only did not use exceptions, but explicitly opted out of them with noexcept. They got a 15% performance boost from turning off exceptions in the compiler, for no apparent reason. C, not having exceptions, does not have this phantom overhead.
C++ can be used to write code that generates assembly equivalent to pretty much any C. A lot of standards committee work goes into ensuring that's possible. The trade-off is that it's the closest thing humans have ever produced to a lovecraftian programming language.
In any case, my point is that C++ features often carry overhead that simply does not exist in C. Being able to get C++ code to run as fast as C code (possibly by not using any C++ features) does not change that.
Intel replaced ICC with a LLVM fork and Microsoft’s compiler is used by only a subset of Windows’ developers. There are few other compilers in widespread use for C and C++. I believe ARM has a compiler, but Linaro has made it such that practically nobody uses it. Pathscale threw in the towel several years ago too. There is the Compcert C compiler, but it is used in only niche applications. I could probably name a few others if I tried, but they are progressively more niche as I continue.
JAX and Triton compile your python code to incredibly fast GPU kernels. If you want to run pure python, then there are JIT based runtimes like Jython or PyPy that run your code faster.
What it boils down to is the fact that CPython is an incredibly slow runtime and CPython dominates due to interoperability with C extensions.
I don't know why, but I've seen a lot of people act as if the C language is some kind of voodoo thing as if C being fast is due to mere superstition. "Everyone knows C is the fastest, therefore C is the fastest" What you're doing is the equivalent of reading tea leaves or horoscopes or being an audiophile.
<snip> Datoviz is a relatively low-level visualization library. It focuses on rendering visual primitives like points, lines, images, and meshes — efficiently and interactively.
Unlike libraries such as Matplotlib, Datoviz does not provide high-level plotting functions like plt.plot(), plt.scatter(), or plt.imshow(). Its goal is not to replace plotting libraries, but to serve as a powerful rendering backend for scientific graphics. </snip>
It has far fewer features since it focuses solely on rendering common primitives. For example, it doesn't handle data file loading, except for a few quick loaders like OBJ used mainly for testing.
There's almost no computational geometry, data processing, or signal processing functionality. Datoviz is solely focused on interactive rendering.
Datoviz also supports fast, high-quality 2D vector graphics, scaling efficiently to millions of points. In contrast, VTK is primarily designed for more complex and heavy 3D rendering tasks.
pybind11 is a C++ wrapper that makes the Python API more friendly to use from C++ (e.g. smart pointers instead of manual reference counting)
Compromises needed to compile code on PDP-11, etc. are not some sacred immutable facts about how toolchain ought to work. One could learn everything about the crufty compiler toolchains and decide that it's all just a layering of over-abstracted nonsense over obsolete nonsense.
I do agree that UNIX compilation model is not the best example around, though.
Nonetheless, most languages on UNIX like OSes will for the most part follow the same model, as they need to fit into the ecosystem, as otherwise there is always an impedance mismatch that will prevent their adoption in some cases.
That is how you get modern languages like any wanabee C or C++ replacement using the same compiler toolchains and linkers, instead of compiler based build and linker systems, as languages not born into the UNIX culture.
Thus pretending systems programming languages are like scripting languages will hardly help.
Fun project, and it's really cool to see my little renderer in the interactive viewport in Blender, but I have also learned that I don't particularly enjoy working with non-trivial amounts of Python code.
[1] https://github.com/vkoskiv/c-ray [2] https://github.com/vkoskiv/c-ray/blob/51a742b2ee4d0b570975cd... [3] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd... [4] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd... [5] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd...
My general rule (when doing the opposite of passing C values to python as part of an extension) is to use the dedicated function, like PyLong_FromLong, when there is a single return and Py_BuildValue (with its format string for automagic conversion) when the function returns a tuple.
Oh, and if you are checking your return values from python (you are, right?) using Py_XDECREF isn't all that good of an idea since it will mask some flawed logic. I pretty much only use it when a PyObject pointer could validly be NULL (like with an optional function argument) and I'm cleaning up right before throwing an exception. Tracking python reference counts is a whole blog post in itself since some functions steal references and others don't and if you get it wrong you can easily crash the interpreter and/or create memory leaks.
muragekibicho•3d ago
This article is about embedding Python scripts inside a C codebase