Invoking the smarter-than-thou effect is not a great starting point.
See e.g. https://www.sciencedirect.com/science/article/abs/pii/S01602...
If we’re considering a library, it would be prudent of us to take a look at the source code to see what exactly we’re pulling in. In the process, we would learn about the lay of the land, the API and the internals, and get at least an overview of the complexity of the problem it solves.
Anyways...I've had a few reoccurring issues with libraries. Note that the language is framed on a case by case basis...not general rules.
1. The essential implementation is a small amount of code...wrapped in structures just for packaging essential code. The wrapping code can be larger & more complex than the essential code.
2. There's small differences between what's needed & what's provided. Which requires workarounds for the desired outcome. These workarounds muddy the logic & can be pervasive at scale.
3. There can be dissonance between the app architecture & the library api.
4. Popular libraries in particular...create a culture of thinking in terms of the library/framework. Leading to resource inefficiencies...And outright dismissing solutions that are a better match for the domain. In short, the library/framework api frames the problem & solution...Which may not match the actual problem & optimal solution.
5. The library/framework authors are concerned about promoting the library/framework. Not solving the actual problem. Many problems need to be solved. The library/framework just be the "Golden Hammer" to pound in your screw.
With all that being said...there are many useful libraries that define & solve problems in their particular domain. Particularly with common, well defined, appropriately scoped requirements.
Though the addition of pipes to the base language is helping fix that.
I don't think DK has anything to do with people releasing libraries that nobody should use.
It's kinda like project specific semantic monomorphization.
> If there is a battle tested, well known package that can help us, then recommend it BEFORE implementing large swaths of custom code.
Monomorphization means taking a generic function and generating a version specific to the type being used, eg a Rust function
fn identity<T>(x: T) -> T {
x
}
can be compiled into one version for i32 and one for String, which is more efficient since the compiler knows the types: fn identity_i32(x: i32) -> i32 { x }
fn identity_string(x: String) -> String { x }
Semantic monomorphization could mean extracting the parts of the library that are meaningful to generate problem-specific concrete code: instead of importing pandas to do import pandas
df = [{"a": 1}, {"a": 2}]
total = sum(d["a"] for d in df)
The LLM might skip the import entirely and generate only: data = [{"a": 1}, {"a": 2}]
total = sum(d["a"] for d in data)
If I understood right, the parent found it funny that a comment suggesting we could never use libraries because we can concretize the specific relevant code, would be responded to with a Claude.MD that essentially said "always use libraries instead of concrete relevant code". I missed it because I didn't stop to look up "monomorphization", so I hope this helps anyone else like me get the joke.Why does this reduce your attack surface? Can the functions in the library, unrelated to the ones you're using, be triggered by user input somehow?
Languages and domais that have leaned too faar into package managers and small libraries are prone to fragility and security nightmares.
For any "serious" application of critical code; every library used need to be vetted and verifierad to be maintained and secure.
Id much rather deal with a bug in our code than a depricated library or breaking version update.
If we are to use a library outside of standard unix or stdlib within my field, better expect a nighmareish code review and a meeting.
Besides being fun; implementing it ourselves improves our skill level for the future. Something vibe coding itself goes against aswell.
A project only become serious once legal is breathing down engineering's neck. Before that, it's usually the far west. After, it becomes a security circus trying to patch the technology deficiency (custom registries, complex linting and other analysis tooling,...)
And yes, I agree.
https://www.npmjs.com/package/boolean
>converts lots of things to boolean.
>3 million weekly downloads
This is insane.
['yes', 'y', '1'].indexOf(input.toLowerCase()) !== -1
People adding a dependency to avoid writing one line of code...My celery/RabbitMQ-based web crawler failed because of the Cloudflare CAPTCHAs, I figured it was best to empty out the queue and archive it. I asked copilot what to do and it told me to use a CLI program. “Does that come with RabbitMQ?” “No, you download it from GitHub”. It offered to write me a Python script but the CLI program did exactly what I needed. It got an option wrong but I’d expect the same if I asked a friend for help.
This is analogous to folks who claim nobody is going to be able to learn software engineering any more. I think it is just the opposite. LLMs can be an awesome tool for learning.
On the article: some use cases eg handling dates, fault tolerant queues have so many edge cases and are so mission critical that relying on a battle tested tool makes a lot of sense.
However, in my career I’ve seen a lot of examples of a package being installed to avoid 40-50 lines of well thought out code and now a dependency is forever embedded in the system.
I think there is a catch with replacing libraries with LLM generated code. Part of the benefit of skipping third party libraries is the domain knowledge that gets built up: this is potentially lost with llm generated code.
In my experience the big problem is that the documentation is always terrible, you can't ask open-ended questions on stack overflow, the library's reddit (if any) has zero users, and anything asked on their discord is not searchable.
It's incredible that we still don't have a stack overflow that is just a forum.
I want people in the company to use it, but it's big and complicated (lots of chipsets and Bluetooth to boot).
I'm trying to design the library so the MCP can tell the LLM to pull it from our repo, read the prompt file for instructions and automatically integrate with the code.
I can't get it to do it consistenlty. There is a big gap in the current LLM tech where there is no standard/consistent way to tell an LLM how to interface with a library (C/Python/Java/etc.)
The LLM more often than not will read the library and then start writing duplicate code.
Maddening.
I'm still not clear on what the best patterns for this are myself. I've been experimenting with dumping my entire documentation into the model as a single file - see https://github.com/simonw/docs-for-llms and https://github.com/simonw/llm-docs - but I'd like to produce shorter, optimized documentation (probably with a whole bunch of illustrative examples) that use fewer tokens and get better results.
I'm a library author myself, so publishing LLM-enhanced versions of the docs to help other people use my library more effectively feels like a sensible use of my time.
(The quotes comes from a different context, but works quite well here as well.)
(or "naïve")
tptacek•4h ago
tedunangst•3h ago
tptacek•3h ago