There's no default right answer to this, as the answer depends entirely on what you're sorting and how you want it sorted. Even for a given character set the "correct" alphabetical sorting is still locale dependent.
And even knowing all that, "correct" programmatic sorting might still be essentially impossible. Some digraphs may be sorted differently depending on the specific word. For example A vs Aa, where Aa means Å. But Aa won't always necessarily mean Å, so good luck figuring that out.
Doing a dumb sort by character or byte values is obviously the wrong call for any diacritics, but the right call may also depend on the language.
It would have been reasonable to conclude the article a third of the way through, and say "sorting is locale-dependent, if what you value is consistent behaviour between different OSs (instead of sorting based on the user's preferences) you need to implement the sorting yourself."
The article does mention it but in passing.
* Albertslund
* Odense
* Aarhus
This feels like material for another Tom Scott video.Pike matchbox.
My worry is that it would perform badly on really large directories... That said, for where it's a pain, it would be helpful to say the least.
And then a lot of languages are used in different countries with different rules.
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...
The `locales-all` package works more like macOS. It's only a ~10MB download but unpacks to take ~250MB of disk space (these numbers will vary based on your libc version and packaging format).
There are a lot of sparse arrays and UTF32 character data in compiled locales.
Incidentally, the command to dump a locale's data is:
LC_ALL=whatever locale -ck `locale | sed 's/=.*//; /LANG\|LC_ALL/d'` # source all shell config
export LC_COLLATE=C # ensure consistent sort, ~ at end
for file in ~/bin/shell/**/*.(z|)sh; do
source "$file";
done
OptionOfT•3mo ago