> Really any finite set can be a "dimension".
This is absolutely true and yet typical programming languages don't do a good job of unleashing this. When is the last time you wish you could define an array where the indices are from a custom finite set, perhaps a range 100 to 200, perhaps an enum? I only know of two languages, Pascal and Haskell, that are sufficiently flexible. All other languages insist that indices for arrays must be integers either starting from 0 or 1. Haskell has a mechanism to let you control what is an array index: https://hackage.haskell.org/package/base-4.18.1.0/docs/Data-... The lack of this ability makes programmers use more general hash maps by default and therefore leave performance on the table.
> We can also transform Row -> PowerSet(Col) into Row -> Col -> Bool, aka a boolean matrix.
I mean sure. Does the author know type signatures and their operations like this are isomorphic to algebraic operations on size of the possibly infinite sets? The function A -> B has exactly |B|^|A| implementations so that's why Col -> Bool is in this sense "same" as PowerSet(Col). And the function operator associates to the right so adding Row -> doesn't change anything. Wait till you learn why sum types are called sum types and why product types are called product types. Here's another thing that might be mind-blowing: arrays of an unknown length can be thought of as the infinite union of arrays of lengths of all natural numbers: Array<T, 0> | Array<T, 1> | … so using the above notation the cardinality is 1 + |T| + |T|^2 + …; but they can also be represented as a standard linked list List<T>=Nil|(T, List<T>) and using the above notation we have |List<T>|=1+|T|*|List<T>| which simplifies to |List<T>|=1/(1-|T|). And guess what, the Taylor expansion of that is just the earlier 1 + |T| + |T|^2 + … precisely.
Julia, Fortran, some BASICs allow custom integer ranges.
Ada allows you to use any discrete range (integer, character, enums).
https://en.wikipedia.org/wiki/Comparison_of_programming_lang... - A more comprehensive list
The two columns of interest are "Specifiable index type" and "Specifiable base index".
Interestingly, though, Perl’s native arrays are decidedly not an array of primitives. They are arrays of scalar values, and a scalar value can not only contain a character, a string, an integer, a float, or a reference to another item (scalar, hash, array, blessed object, filehandle, etc) but it can return different values when called in a numeric context vs a string context. There are certainly ways to get at primitive values, but it’s not in Perl’s native array semantics.
> ...
> The lack of this ability makes programmers use more general hash maps by default and therefore leave performance on the table.
You can simulate this in any language by hash-mapping the custom set elements to indices. (And in the general case, you won't be able to determine the indices faster than a hash table lookup.)
In fact, I imagine there are languages and implementations where hash table lookup is the fastest way to map 'a':'z' to 0:25.
In fact, let me try a (surely totally bogus) micro-benchmark right now:
$ python -m timeit --setup 'lookup = dict(zip("abcdefghijklmnopqrstuvwxyz", range(26)))' '[lookup[x] for x in "abcdefghijklmnopqrstuvwxyz"]'
500000 loops, best of 5: 884 nsec per loop
$ python -m timeit '[ord(x) - 97 for x in "abcdefghijklmnopqrstuvwxyz"]'
200000 loops, best of 5: 1.21 usec per loop
Huh....And surely if you use the hash map directly, instead of using it to grab an index into another container, overall performance only gets better (unless maybe you need to save per-instance memory, and you have multiple containers using the same custom index scheme).
Hash tables are a pretty neat idea.
You can also define integer-valued named constants, and use those if you prefer.
In C, an array is just contiguous bytes, and you reach an index through math (starting location + (index * size)).
Pretty sure lots of languages still do this to some degree, as your lookup times are effectively zero, though altering the number of elements is way more complex/expensive than in a linked list.
By the way, that's the only reason why C-derived languages use unintuitive zero-based indexing. There's really no other reason to call the first element a[0] instead of a[1].
It removes a level of indirection, wasted space, and mental overhead. Suppose you have some table of 'a'..'z' -> whatever. What are your options?
- Switch/case. This may be optimized and fast, but that's not guaranteed across all languages and compilers.
- Hash map. This introduces memory overhead even if it is technically O(1) for access times.
- If in C, you could do:
T* table = (T*)malloc(sizeof(T) * 26) - 'a'
So that the characters can be used naturally without later modification (not a feature in every language, though).- Manually calculate each offset every time:
table[c - 'a'] = ...
With language appropriate type conversions, if needed.Or you can allow any custom range to work, and do the math in that last option automatically because it's trivial for a compiler to figure it out and it removes unnecessary mental overhead from the program reader and writer.
table['a']
Just works.> You can also define integer-valued named constants, and use those if you prefer.
Because everyone wants to see `lower_case_a = 0` and the 25 other cases in their code along with all other possible constants. At least use an enum, sane languages even make them type-safe (or at least safer).
Where are you getting that from? That's not the characteristic feature of Arrays, that's the characteristic feature of Doubly Linked Lists.
I'm not saying that if you have value of any element you could derive the value of the previous or next element from it. I'm saying "it has a previous element" (unless it is the 1st element) meaning there is such a previous element and it is also possible to access that previous element if you know the index of the current element under investigation.
Structuring Arrays with Algebraic Shapes
The IBM 704 had 3 index registers, which could be combined together with a bitwise OR, then was subtracted from a base address, that's where FORTRAN for its 7D arrays from.
When you mix that concept of decrementing index registers with hypercubes, you will naturally get to the extract format of BI tools like classic Tableau.
I sincerely believe this aspect of the filter is a misfeature. Submissions that have bad reasons to put a number in the title are generally submissions that are just bad (and should be flagged) anyway.
Multidimensional arrays are a blind spot for modern language designers. Not arrays of arrays, as this author points out. FORTRAN had multidimensional arrays, but C didn't. Go and Rust don't have them. Part of the trouble is slices. Once you have slices, people want slicing in multidimensional arrays along any axis, which complicates the basic array representation. Discussions in this area dissolved into bikeshedding and nothing got done for Go and Rust.
I think it's more accurate to just call them hash tables ("dicts") that happen to support some very limited array-like functionality. "Appending" still looks like key insertion if you don't use a named method (https://stackoverflow.com/questions/27434142); `ipairs` misses those non-integer keys but also misses non-contiguous keys, or even a zero key (https://stackoverflow.com/questions/55108794); the `#` operator is not well defined once you add those non-contiguous keys (https://stackoverflow.com/questions/2705793) (so you can't actually cleanly override the choice to "start from 1"); slicing etc. aren't provided as operators (https://stackoverflow.com/questions/24821045); etc.
Being able to express slices across arbitrary dimensions natively is another matter.
[0] https://en.cppreference.com/w/cpp/language/array.html#Multid...
The trade-offs are way too severe for people to agree to any implementation in a low level language like Rust or one that wants to be low level like Go.
We could get them in something like Python.
Merely 2000 words, we have a full complete book for that [1].
Joking aside, D4M has seamlessly combined spreadsheet, table, database and graph concepts based on associative array mathematics [2].
On one extreme people are going to bolt on everything on Postgresql database, and another extreme of integrating clunky disparate systems, D4M is a breath of fresh air that is based on mathematics not unlike the venerable SQL relational database concepts [3].
[1] Mathematics of Big Data Spreadsheets, Databases, Matrices, and Graphs:
https://mitpress.mit.edu/9780262038393/mathematics-of-big-da...
[2] D4M: Dynamic Distributed Dimensional Data Model:
[3] Just Use Postgres for Everything: How to reduce complexity and move faster:
https://www.amazingcto.com/postgres-for-everything/
[4] Just Use Postgres for Everything (430 comments):
mvc•23h ago
https://developer.mozilla.org/en-US/docs/Web/XML/XPath/Refer...