How/why did the package maintainers start using all these improvements? Some of them sound like a bunch of work, and getting a package ecosystem to move is hard. Was there motivation to speed up installs across the ecosystem? If setup.py was working okay for folks, what incentivized them to start using pyproject.toml?
It wasn't working okay for many people, and many others haven't started using pyproject.toml.
For what I consider the most egregious example: Requests is one of the most popular libraries, under the PSF's official umbrella, which uses only Python code and thus doesn't even need to be "built" in a meaningful sense. It has a pyproject.toml file as of the last release. But that file isn't specifying the build setup following PEP 517/518/621 standards. That's supposed to appear in the next minor release, but they've only done patch releases this year and the relevant code is not at the head of the repo, even though it already caused problems for them this year. It's been more than a year and a half since the last minor release.
"Rewrite" is important as "Rust".
Helpfully, though, uv retains compatibility with newer (but still well-established) standards in the Python community that don't share this insanity!
I think Rust itself has this benefit
Tapping the Rust community is a decent reason to do a project in Rust.
Sometimes I thought our teams would be a terrible fit for more cookie-cutter applications where rapid development and deployment was the primary objective. We got into the weeds all the time (sometimes because of rust itself), but it happened to be important to do so.
Had we built those projects with JavaScript or Python I suspect the outcomes would have been worse for reasons apart from the language choice.
I genuinely can't understand why you suppose that has to do with the implementation language at all.
Languages that attract novice programmers (JS is an obvious one; PHP was one 20 years ago) have a higher noise to signal ratio than one that attracts intermediate and above programmers.
If you grabbed an average Assembly programmer today, and an average JavaScript programmer today, who do you think is more careful about programming? The one who needs to learn arcane shit to do basic things and then has to compile it in order to test it out, or the one who can open up Chrome's console and console.log("i love boobies")
How many embedded systems programmers suck vs full stack devs? I'm not saying full stack devs are inferior. I'm saying that more inferior coders are attracted to the latter because the barriers to entry are SO much easier to bypass.
With that said, Rust was a good language for this in my experience. Like any "interesting" thing, there was a moderate bit of language-nerd side quest thrown in, but overall, a good selection metric. I do think it's one of the best Rewrite it in X languages available today due to the availability of good developers with Rewrite in Rust project experience.
The Haskell commentary is curious to me. I've used Haskell professionally but never tried to hire for it. With that said, the other FP-heavy languages that were popular ~2010-2015 were absolutely horrible for this in my experience. I generally subscribe to a vague notion that "skill in a more esoteric programming language will usually indicate a combination of ability to learn/plasticity and interest in the trade," however, using this concept, I had really bad experiences hiring both Scala and Clojure engineers; there was _way_ too much academic interest in language concepts and way too little practical interest in doing work. YMMV :)
I will bring popcorn on python 4 release date.
This (zero-copy deserialization) is not a rust-specific technique, so I'm not entirely sure why the author describes it as one. Any good low level language (C/C++ included) can do this from my experience.
For example "No interpreter startup" is not specific to Rust either.
(But also, I think Rust can fairly claim that it's made zero-copy deserialization a lot easier and safer.)
(I'm agnostic on whether zero-copy "matters" in every single context. If there's no complexity cost, which is what Rust's abstractions often provide, then it doesn't really hurt.)
Also, aside from memory bandwidth, there’s a latency cost inherent in traversing object graphs - 0 copy techniques ensure you traverse that graph minimally, just what’s needed to actually be accessed which is huge when you scale up. There’s a difference between one network request and fetching 1 MB vs making 100 requests to fetch 10kib and this difference also appears in memory access patterns unless they’re absorbed by your cache (not guaranteed for object graph traversal that a package manager would be doing).
with zipfile.ZipFile(archive_name) as a:
with a.open(file_name) as f, io.BytesIO() as b:
b.write(f.read())
return b.getvalue()
(That does, of course, copy data around within memory, but.)This is not what zero-copy means. Here's a working definition[1].
Specifically, it's not just about keeping things in memory; copying in memory is normal. The goal is to not make copies (or more precisely, what Rust would call "clones"), but to instead convey the original representation/views of that representation through the program's lifecycle where feasible.
> a deserialized version of the data necessarily cannot be the same object as the original data
rust-asn1 would be an example of a Rust library that doesn't make any copies of data unless you explicitly ask it to. When you load e.g. a Utf8String[2] in rust-asn1, you get a view into the original input buffer, not an intermediate owning object created from that buffer.
> (That does, of course, copy data around within memory, but.)
Yes, that's what makes it not zero-copy.
[1]: https://rkyv.org/zero-copy-deserialization.html
[2]: https://docs.rs/asn1/latest/asn1/struct.Utf8String.html
about rust though
some say a nicer language helps finding the right architecture (heard that about cpp veteran dropping it for ocaml, any attempted idea would take weeks in cpp, was a few days in ocaml, they could explore more)
also the parallelism might be a benefit the language orientation
enough semi fanboyism
Unless I've been seeing very different submissions than you, "pet peeve" seems like the exact opposite of what is actually the case?
I feel that sometimes there's a desire on the part of those who use tool X that everyone should use tool X. For some types of technology (car seat belts, antibiotics...) that might be reasonable but otherwise it seems more like a desire for validation of the advocate's own choice.
Isn't assigning out what all made things fast presumptive without benchmarks? Yes, I imagine a lot is gained by the work of those PEPs. I'm more questioning how much weight is put on dropping of compatibility compared to the other items. There is also no coverage for decisions influenced by language choice which likely influences "Optimizations that don’t need Rust".
This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.
We also have the benchmark of "pip now vs. pip years ago". That has to be controlled for pip version and Python version, but the former hasn't seen a lot of changes that are relevant for most cases, as far as I can tell.
> This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.
This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML. A package installer really only needs to look at it at all when building from source, and part of what these standards have done for us is improve wheel coverage. (Other relevant PEPs here include 600 and its predecessors.) Although that has also largely been driven by education within the community, things like e.g. https://blog.ganssle.io/articles/2021/10/setup-py-deprecated... and https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... .
I blame fixed AI system prompts - they forcibly collapse all inputs into the same output space. Truly disappointing that OpenAI et all have no desire to change this before everything on the internet sounds the same forever.
As you said, reading this stuff is taxing. What's more, this is a daily occurrence by now. If there's a silver lining, it's that the LLM smells are so obvious at the moment; I can close the tab as soon as I notice one.
Fairly easy, in my wife's experience. She repeatedly got accused of using chatgpt in her original writing (she's not a native english speaker, and was taught to use many of the same idioms that LLMs use) until she started actually using chatgpt with about two pages of instructions for tone to "humanize" her writing. The irony is staggering.
It’s really useful for taking my first drafts and cleaning them up ready for a final polish.
> This is concurrency, not language magic.
> This is filesystem ops, not language-dependent.
Duh, you literally told me that the previous sentence and 50 million other times.
Oh well, all I can do is flag.
But still, I'm skeptical.
If it is doable, the best way to prove it is to actually do it.
If no one implements it, was it ever really doable?
Even if there is no technical reason, perhaps there is a social one?
I was going to learn Python for just that (file-conversion utilities and the like), but everybody was so down on the messy ecosystem that I never bothered.
In my view that was by far the biggest issue with Python - a complete deal-breaker really. But uv solves it pretty well.
The remaining big issues are a) performance, and b) the import system. uv doesn't do anything about those.
Performance may not be an issue in some cases, and the import system is ... tolerable if you're writing "a python project". If you're writing some other project and considering using Python for its scripting system, e.g. to wrangle multiple build systems or whatever than the import mess is a bigger issue and I would thing long and hard before picking it over Deno.
Umm…
Edit to add: I use python daily
I do use it on CI/CD pipelines, but I wouldn't dare type uvx commands myself on a daily basis.
A similar, but less drastic speedup applied to docker images.
uv has been a delight to use
I'd characterize that as unusable, for sure.
(also switching python version is so fast)
It is also really nice when working interactivly to have snappy tools that don't take you out of the flow more than absolutely more than necessary. But then I'm quite sensitive to this, I'm one of those people who turn off all GUI animations because they waste my time and make the system feel slow.
But small efficiencies do matter; see e.g. https://danluu.com/productivity-velocity/.
... Okay, after a brief look, there's still lots of room for me to comment. In particular:
> pip’s slowness isn’t a failure of implementation. For years, Python packaging required executing code to find out what a package needed.
This is largely refuted by the fact that pip is still slow, even when installing from wheels (and getting PEP 600 metadata for them). Pip is actually still slow even when doing nothing. (And when you create a venv and allow pip to be bootstrapped in it, that bootstrap process takes in the high 90s percent of the total time used.)
This is kind of fascinating. I've never considered runtime upper bound requirements. I can think of compelling reasons for lower bounds (dropping version support) or exact runtime version requirements (each version works for exact, specific CPython versions). But now that I think about it, it seems like upper bounds solve a hypothetical problem that you'd never run into in practice.
If PSF announced v4 and declared a set of specific changes, I think this would be reasonable. In the 2/3 era it was definitely reasonable (even necessary). Today though, it doesn't actually save you any trouble.
But if we accept that it currently ignores any upper-bounds checks greater than v3, that's interesting. Does that imply that once Python 4 is available, uv will slow down due to needing to actually run those checks?
That said, even if it does happen, I highly doubt that is the main part of the speed up compared to pip.
pip is simply difficult to maintain. Backward compatibility concerns surely contribute to that but also there are other factors, like an older project having to satisfy the needs of modern times.
For example, my employer (Datadog) allowed me and two other engineers to improve various aspects of Python packaging for nearly an entire quarter. One of the items was to satisfy a few long-standing pip feature requests. I discovered that the cross-platform resolution feature I considered most important is basically incompatible [1] with the current code base. Maintainers would have to decide which path they prefer.
No, every code path you don't execute is that. Like
> No .egg support.
How does that explain anything if the egg format is obsolete and not used?
Similar with spec strictness fallback logic - it's only slow if the packages you're installing are malformed, otherwise the logic will not run and not slow you down.
And in general, instead of a list of irrelevant and potentially relevant things would be great to understand some actual time savings per item (at least those that deliver the most speedup)!
But otherwise great and seemingly comprehensive list!
Even in compiled languages, binaries have to get loaded into memory. For Python it's much worse. On my machine:
$ time python -c 'pass'
real 0m0.019s
user 0m0.013s
sys 0m0.006s
$ time pip --version > /dev/null
real 0m0.202s
user 0m0.182s
sys 0m0.021s
Almost all of that extra time is either the module import process and then garbage collection. Even with cached bytecode, the former requires finding and reading from literally hundreds of files, deserializing via `marshal.loads` and then running top-level code, which includes creating objects to represent the functions and classes.It used to be even worse than this; in recent versions, imports related to Requests are deferred to the first time that an HTTPS request is needed.
> Plenty of tools are written in Rust without being notably fast.
This also hasn't been my experience. Most tools written in Rust are notably fast.
Not that I'd bother since uv does venv so well. But, "it's not all rust runtime speed" implies pip could be faster too.
What I want from a package manager is that it just works.
That's what I mostly like about uv.
Many of the changes that made speed possible were to reduce the complexity and thus the likelihood of things not working.
What I don't like about uv (or pip or many other package managers), is that the programmer isn't given a clear mental model of what's happening and thus how to fix the inevitable problems. Better (pubhub) error messages are good, but it's rare that they can provide specific fixes. So even if you get 99% speed, you end up with 1% perplexity and diagnostic black boxes.
To me the time that matters most is time to fix problems that arise.
This is a priority for PAPER; it's built on a lower-level API so that programmers can work within a clear mental model, and I will be trying my best to communicate well in error messages.
This entire AI generated article with lots of text just to just say the obvious.
yjftsjthsd-h•2h ago
Are we losing out on performance of the actual installed thing, then? (I'm not 100% clear on .pyc files TBH; I'm guessing they speed up start time?)
woodruffw•2h ago
yjftsjthsd-h•2h ago
My first cynical instinct is to say that this is uv making itself look better by deferring the costs to the application, but it's probably a good trade-off if any significant percentage of the files being compiled might not be used ever so the overall cost is lower if you defer to run time.
woodruffw•2h ago
Sure, but you pay that hit either way. Real-world performance is always usage based: the assumption that uv makes is that people run (i.e. import) packages more often than they install them, so amortizing at the point of the import machinery is better for the mean user.
(This assumption is not universal, naturally!)
dddgghhbbfblk•1h ago
woodruffw•1h ago
(The key part being that 'less common' doesn't mean a non-trivial amount of time.)
beacon294•2h ago
I just read the thread and use Python, I can't comment on the % speedup attributed to uv that comes from this optimization.
Epa095•1h ago
If it was a optional toggle it would probably become best practice to activate compilation in dockerfiles.
saidnooneever•1h ago
tedivm•39m ago
VorpalWay•19m ago
I would bet on a subset for pretty much any non-trivial package (i.e. larger than one or two user facing modules). And for those trivial packages? Well they are usually small, so the cost is small as well. I'm sure there are exceptions: maybe a single gargantuan module thst consists of autogenerated FFI bindings for some C library or such, but that is likely the minority.
salviati•2h ago
plorkyeran•1h ago
hauntsaninja•1h ago
thundergolfer•57m ago