We need then so that we can find all functions that are core to a given purpose, and have been written with consideration of their performance and a unified purpose rather than also finding a grab bag of everybody's crappy utilities that weren't designed to scale for my use case.
We need them so that people don't have to have 80 character long function names prefixed with Hungarian notation for every distinct domain that shares the same words with different meanings.
Having a hierarchical naming system that spans everything makes it largely irrelevant how the functions themselves are physically organized. This also provides a pattern for disambiguating similar products by way of prefixing the real world FQDNs of each enterprise.
While a function may have local variables that are protected from external accesses, a module can export not only multiple functions, but any other kinds of symbols, e.g. data types or templates, while also being able to keep private any kind of symbol.
In languages like C, which have separate compilation, but without modules, you can partition code in files, then choose for each symbol whether to be public or not, but with modules you can handle groups of related symbols simultaneously, in a simpler way, which also documents the structure of the program.
Moreover, with a well-implemented module system, compilation can be much faster than when using inefficient tricks for specifying the interfaces, like header file textual inclusion.
Quite often coders optimise for searchability, so like there will be a constants file, a dataclasses file, a "reader"s file, a "writer"s file etc etc. This is great if you are trying to hunt down a single module or line of code quickly. But it can become absolute misery to actually read the 'flow' of the codebase, because every file has a million dependencies, and the logic jumps in and out of each file for a few lines at a time. I'm a big fan of the "proximity principle" [1] for this reason - don't divide code to optimise 'searchability', put things together that actually depend on each other, as they will also need to be read / modified together.
[1] https://en.wikipedia.org/wiki/Cohesion_(computer_science)
It's difficult because it is a core part of software engineering; part of the fundamental value that software developers are being paid for. Just like a major part of a journalist's job is to first understand a story and then lay it out clearly in text for their readers, a major part of a software developer's job is to first understand their domain and then organize it clearly in code for other software developers (including themselves). So the act of deciding which modules different functions go in is the act of software development. Therefore, these people:
> Quite often coders optimise for searchability, so like there will be a constants file, a dataclasses file, a "reader"s file, a "writer"s file etc etc.
Those people are shirking their duty. I disdain those people. Some of us software developers actually take our jobs seriously.
In practice, it wound up not quite being worth it (the concept requires the same file to "exist" in multiple locations for that idea to work with all your other tools in a way that actually exploits tags, but then when you reference a given file (e.g., to import it) that needs to be some sort of canonical name in the TFS so that on `cd`-esque operations you can reference the "right" one -- doable, but not agnostic of the file format, which is the point where I saw this causing more problems than it was solving).
I still think there's something there though, especially if the editing environment, programming language, and/or representation of the programming language could be brought on board (e.g., for any concrete language with a good LSP, you can re-write important statements dynamically).
A directory hierarchy feels more pleasant when it maps to features, instead. Less clutter.
Most programmers do not care about OO design, but "connascence" has some persuasive arguments.
https://randycoulman.com/blog/2013/08/27/connascence/
https://practicingruby.com/articles/connascence
> Knowing the various kinds of connascence gives us a metric for determining the characteristics and severity of the coupling in our systems. The idea is simple: The more remote the connection between two clusters of code, the weaker the connascence between them should be.
> Good design principles encourages us to move from tight coupling to looser coupling where possible. But connascence allows us to be much more specific about what kinds of problems we’re dealing with, which makes it easier to reason about the types of refactorings that can be used to weaken the connascence between components.
Makes me wonder what it would look like if you gave "topics" to code as you wrote it. Where would you put some topics? And how many would you have that are part of several topics?
Instead of posting a topic in a subforum, what if subforums were turned into tags and you just post your topic globally with those tags. Now you can have a unified UI that shows all topics, and people can filter by tag.
I experimented with this with a /topics page that implemented such a UI. What I found was that it becomes one big soup that lacks the visceral structure that I quickly found to be valuable once it was missing.
There is some value to "Okay, I clicked into the WebDesign subforum and I know the norms here and the people who regularly post here. If I post a topic, I know who is likely to reply. I've learned the kind of topics that people like to discuss here which is a little different than this other microclimate in the RubyOnRails subforum. I know the topics that already exist in this subforum and I have a feel for it because it's separate from the top-level firehose of discussion."
I think something similar happens with modules and grouping like-things into the same file. Microclimates and micronorms emerge that are often useful for wrapping your brain around a subsystem, contributing to it, and extending it. Even if the norms and character change between files and modules, it's useful that there are norms and character when it comes to understanding what the local objective is and how it's trying to solve it.
Like a subforum, you also get to break down the project management side of things into manageable chunks without everything always existing at a top organizational level.
Most things have multiple kinds of interesting properties. And in general, the more complex the thing, the more interesting properties it has. Ofc "interesting" is relative to the user/observer.
The problem with hierarchical taxonomies, and with taxonomies in general, is that they try to categorize things by a single property. Not only that, the selection of the property to classify against, is relevant to the person who made the selection, but it might not be relevant, or at least the most relevant, property for others who need to categorize the same set of things.
Sometimes people discover "new" properties of things, such as when a new tool or technique for examining the things, comes into existence. And new reasons for classifying come into existence all the time. So a hierarchical taxonomy begins to become less relevant, as soon as it is invented.
Sometimes one wants to invent a new thing and needs to integrate it into an existing taxonomy. But they have a new value for the property that the taxonomy uses for classification. Think back to SNMP and MIBs and OIDs. Now the original classifier is a gatekeeper and you're at their mercy to make space for your thing in the taxonomy.
In my experience, the best way to classify things, ESPECIALLY man-made things, is to allow them to be freely tagged with zero or more tags (or if you're a stickler, one or more tags). And don't exert control over the tags, or exert as little control as you can get away with. This allows multiple organic taxonomies to be applied to the same set of things, and adapts well to supporting new use cases or not-previously-considered use cases.
Is a lot like genres for music and such. In broad strokes, they work really well. If taken as a requirement, though, they start to be too restrictive.
Modules being collections of types and functions obviously increases coarseness. I'm not a fan of most import mechanisms because it leaves versioning and namespace versioning (if it has namespaces at all...) out, to be picked up poorly by build systems and dependency graph resolvers and that crap.
Consider a function name: log()
Is it a function to log an event for audit history?
Or is it a function to get the mathematical natural logarithm of a number?
The global namespace forces the functions to be named differently (maybe use underscore '_') in "audit_log()" and the other "math_log()". With modules, the names would isolated be colons "::" or a period '.' : Audit.log() and Math.log(). Audit and Math are isolated namespaces. You still have potential global namespace collisions but it happens at the higher level of module names instead of the leaf function names. Coordinating the naming at the level of modules to avoid conflicts is much less frequent and more manageable.
Same issue in os file systems with proposing no folders/directories and only a flat global namespace with metadata tags. The filenames themselves would have embedded substrings with underscores to recreate fake folder names. People would reinvent hierarchy in tag names with concatenated substrings like "tag:docs_taxes_archive" to recreate pseudo folders/directories of "/docs/taxes/archive". Yes, some users could deliberately avoid hiearchies and only name tags as 1-level such as "docs", "taxes", "archive" ... but that creates new organizational problems because some have "work docs" vs "personal docs" ... which gravitates towards a hierarchical organization again.
If you're lucky all functions will have a common prefix str_* or fct_*. If you're unlucky then you have to figure out which package has clobbered a standard library function, or the exact ordering of your package import statements you need for your code to run.
There’s no directories in S3, just object names.
The feature of the object names being hierarchical with “/“ delimiters are out of habit and easier to reason about for the avg user.
universe.mega_corp.finance_dept.team_alpha.foo
But to use `universe.mega_corp.finance_dept.team_alpha.foo` in your application, you don't import a module, just the function `foo`.Who controls what goes into the namespace `universe.mega_corp.finance_dept.team_alpha`? That would be Team Alpha in the Finance Department of Mega Corp.
I guess this is like tree-shaking by default.
open universe.mega_corp.finance_dept.team_alpha
Then when you use `foo`, the compiler would know you mean `universe.mega_corp.finance_dept.team_alpha.foo`.There will probably need to be some kind of lock-file or hash stored with the source-code so that we know precisely which version of `universe.mega_corp.finance_dept.team_alpha.foo` was resolved.
using universe.mega_corp.finance_dept.team_alpha;
Every argument made quickly becomes invalid because in any sufficiently complex project, the function naming scheme will end up replicating a module/namespace system.This is something that would be enabled with hash identifiers and no modules:
let foo_1 = universe.mega_corp.finance_dept.team_alpha@v1.0.0.foo
let foo_2 = universe.mega_corp.finance_dept.team_alpha@v2.0.0.foo
let compare_old_new_foo(x) =
foo_2(x) - foo_1(x)
There would be a corresponding lock-file to make this reproducible: {
"universe.mega_corp.finance_dept.team_alpha": {
"v1.0.0": "aabbcc",
"v2.0.0": "xxyyzz"
}
}
I think this is pretty neat!`import github.com/blah/baz`, `megacorp.com/finance/baz`, ...
It all resolves to `baz.Something()`
They are nodes in a graph, where the other nodes are the input types, output types and other functions.
It makes sense to cluster closely associated notes, hence Modules.
Variables aren't named, they are beta reduced and referred to by abstraction level.
This is exactly what Unison (https://www.unison-lang.org/) does. It’s kinda neat. Renaming identifiers is free. Uh… probably something else is neat (I haven’t used Unison irl)
Furthermore, modules are the unit of versioning. While one could version each individual function separately, that would make managing the dependency graph with version compatibility considerably more complex.
There is the adage “version together what changes together”. That carries over to modules: “group together in a module what changes together”. And typically things change together that are designed together.
Namespaces are an orthogonal issue. You can have modules without namespaces, and namespaces without modules.
2. A key of a function is something like `keccak256(function's signature + docstring)`
3. A value is a list of the function's implementation (index being the implementation's version) and some other useful metadata such as the contributor's signature and preferred function name. (Compiler emits a warning that needs to be explicitly silenced if preferred name is not used.)
4. IDE hints and the developer confirms to auto import the function from the global KV store.
5. Import hash can be prepended with a signers name that's defined in some config file. This makes it obvious in git diffs if a function changes its author. Additionally, the compiler only accepts a short hash in import statements if used with a signer.
package.toml
[signers]
mojmir = "mojmir's pubkey"
radislava = "radislava's pubkey"
source.file // use publisher and short hash
import "mojmir@51973ec9d4c1929b@1" as log_v1;
// or full hash
import "51973ec9d4c1929bdd5b149c064d46aee47e92a7e2bb5f7a20c7b9cfb0d13b39" as log_latest;
import "radislava@c81915ad12f36c33" as ln;
log_v1("Hello");
log_latest(ln(0));
Our minds can (allegedly) only handle 7+/-2 concepts in working memory at once. Your whole codebase has way more than that, right? But one module could easily fit in that range.
Now, imagine your environment of choice supported dynamic runtime loading of code where the code is just dropped to the global namespace. This screams "insecure" and "how do I know if I call the code I want to call?".
Now imagine the only mitigating mechanism was `include_once`. It would make sense software written in this environment requires own CVE namespace as new security vulns are discovered every second
If you expand some of the comments below, he and other members of the community at the time have a nice discussion about hierarchical namespace.
I particularly like his "flat beer and chips" comment:
https://groups.google.com/g/erlang-programming/c/LKLesmrss2k
---
> I'd like to know if there will be hierarchial modules in Erlang, because tree of packages is a rather good idea:
No it's not - this has been the subject of long and heated discussion and is why packages are NOT in Erlang - many people - myself included - dislike the idea of hierarchical namespaces. The dot in the name has no semantics it's just a separator. The name could equally well be encoders.mpg.erlyvideo or mpg.applications.erlvideo.encoder - there is no logical way to organise the package name and it does not scale -
erlyvideo.mpegts.encoder erlyvideo.rtp.encoder
But plain module namespace is also ok. It would be impossible for me to work with 30K LOC with plain function namespace.
The English language has a flat namespace.
I'd like a drink.alcoholic.beer with my food.unhealthy.hamburger and my food.unhealthy.national.french.fries
I have no problem with flat beer and chips.
/Joe
---
English absolutely has namespaces. Every in-group has shibboleths and/or jargon, words that mark membership in the group that have connotations beyond the many dictionary definitions of that word (in fact I wonder how many words with more than three definitions started out as jargon/slang words that achieved general acceptance).
You cannot correctly parse a sentence without the context in which it was written. It’s a literary device some authors use. By letting the reader assume one interpretation of a prophetic sentence early on, the surprise the reader experiences when they discover a different interpretation at the end intensifies the effect.
I think Joe's point is about the perennial discussion whether hierarchy is better than tags. It's as old as software or as old as people started categorizing things. Some early databases were hierarchical KV stores. Email clients and services go through that too, is it better to group messages by tags or have a single hierarchy of folders?
> English absolutely has namespaces
Sure, we can pick apart the analogy, after all we're not programing in English unless we write LLM prompts (or COBOL /s). Then if English has namespaces what would you pick lager.flat.alcoholic or alcoholic.lager.flat or lager.alcoholic.flat, etc? Is there a top-level "lager" vs "ale" package, with a flat vs carbonated as next level?
Hierarchy seems more rigid less general than tags but when it works--it works.
Hierarchy is easy in the physical world.
But what is crazy is since the dawn of computing we can store data however we want and project it however we want…and yet we still use hierarchy for file storage…like we still just have a filing cabinet of manilla folders.
Do you also want me to tell you what is the best way to foresee if a given program will halt?
The point is, bringing facile statements like "just make the right choice" adds nothing to the conversation. Some problems are hard and trying to short circuit the conversation saying that's easy just pick the right tool doesn't even apply here.
in many contemporary programming languages you can express this, too, by exporting some imported name.
> As of 2013, this 3-letter verb common in sports, theater & politics has the largest entry in the online OED.
The correct response? What is "run"?
That's not true of all module systems. It's true in Java, but not in Rust, where it establishes a parent-child relationship, and in which context [1]:
> If an item is private, it may be accessed by the current module and its descendants.
[1] https://doc.rust-lang.org/reference/visibility-and-privacy.h...
Btw, the more experienced I've gotten the more I've found that organizing code is mostly pointless. A 5000-line source file (e.g., module) isn't necessarily worse than five 1000-line files.
Smalltalk have the same live experience, but do have modules, because it makes editing easier and encapsulation is nice for readability and clarity.
(Locality of reference.)
The import piece here which is mentioned but not very emphasized in TFA is that Hoogle lets you search by meta data instead of just by name. If a function takes the type I have, and transforms it to the type I want, and the docs say it does what I want, I don't really care what module or package it's from. In fact, that's often how I use Hoogle, finding the function I need across all Stack packages.
That said, while I think it could work, I'm not convinced it'd have any benefit over the statys quo in practice.
I like Deno for similar reason. It's a coarser level of granularity, and not explicitly content-addressed, but you can import specific versions of modules that are ostensibly immutable, and if you want, you could do single-function modules.
I like the idea so much that I'm now kind of put off by any language/runtime that requires users of my app/library to do a separate 'package install' step. Python being the most egregious, but even languages that I am otherwise interested in, like Racket, I avoid because "I want imports to be unambiguous and automatically downloaded."
Having a one-step way to run a program where all dependencies are completely unambiguous might be my #1 requirement for programming languages. I am weird.
One reason not to do things this way is if you want to be able to upgrade some library independently of other components that depend on it, but "that's what dependency injection is for". i.e. have your library take the other library as an argument, with the types/APIs being in a separate one. TypeScript's type system in particular makes this work very easily. I have done this in Deno projects to great effect. From what I've heard from Rich Hickey[1] the pattern should also work well in Clojure
[1] something something union types being superior to what you might call 'sum types'; can't find the link right now. I think this causes some trouble in functional languages where instead of something being A|B it has to be a C, where C = C A | C B. In the former case an A is a valid A|B, but a C A is not an A, so you can't expand the set of values a function takes without breaking the API. Basically what union types require is that every value in the language extends some universal tagged type; if you need to add a tag to your union then it won't work.
Very much related: https://scrapscript.org/
A use-case could be optimising compilers. These need to search for alternative (faster) series of statements that are provably equivalent to the original given some axioms about the behaviour of the underlying machine code and basic boolean algebra and integer mathematics.
This could be monetised: Theorems along the shortest path from a desired proof to the axioms are rewarded. New theorems can be added by anyone at any time, but would generate zero income unless they improve the state-of-the-art. Shortest-path searches through the data structure would remain efficient because of this incentive.
Client tools such as compilers could come with monthly subscriptions and/or some other mechanism for payments, possibly reusing some existing crypto coin. These tools advertise desired proofs -- just like how blockchain clients advertise transactions they like to complete along with a fee -- and then the community can find new theorems to reach those proofs, hoping not just for the one-time payment, but the ongoing reward if the theorems are general and reusable for other purposes.
Imagine you're a FAANG and there's some core algorithm that uses 1% of your global compute. You could advertise a desire to improve the algorithm or the assembly code to be twice as efficient for $1M. Almost certainly, this is worth it. If no proof turns up, there's no payment. If a proof does turn up, a smart contract debits the FAANG's crypto account and they receive the chain of theorems proving that there's a more efficient algorithm, which will save them millions of USD in infrastructure costs. Maths geeks, AI bots, and whomever else contributed to the proof get their share of the $1M prize.
It's like... Uber for Fields medals, used for industrial computing.
Fully automated gig work for computer scientists and mathematicians.
Though it wouldn't make sense to build something like that on top of such a fast-moving, complex, and bug-prone target like Lean.
https://github.com/joearms/elib1/blob/master/src/elib1_misc....
If a function is built on several helper functions, it may be that those same helper functions can also be used to make other, related things which round out the functionality area. Perhaps they provide an API that's easier to use for different scenarios or whatever.
wruza•6d ago