Are Python Dictionaries Ordered Data Structures?

https://www.thepythoncodingstack.com/p/are-python-dictionaries-ordered-data

11•rbanffy•18h ago

Comments

Spivak•12h ago

I mean this gets to the fundamental question of what being ordered actually means. The author decides that regardless of whether the data structure has an order, even if that order is guaranteed, it's a necessary property that order must be taken into account to determine equality. A very sensible and pure definition that says that equivalence classes should be single elements.

But this is a weird thing to actually do in Python, it's not a common thing to test dictionaries for equality like this. If you parsed some file into a dictionary and you want to preserve the order it was originally in when you write it back out then the standard dict works just fine. There's a tension in the article that if you rely on ordering you ought to consider using OrderedDict instead but that leads you to never rely on dict() ordering which is fine— but it's guaranteed to you. You're supposed to use it! The standard dict() even has a __reversed__ method so ordering is a meaningful property.

cpburns2009•11h ago

I think it was a mistake to standardize that dict will maintain insertion order. It's a nice side effect of the internal implementation, but I fear it will hinder future improvements.

jamesdutc•10h ago

This is a genuine concern, since it hinders our ability to port over high-quality, high-performance hash table implementations from other languages (since these often do not preserve any human ordering.)

However, the ship has already sailed here. I think that once insertion-ordering became the standard, this creates a guarantee that we can't easily back down from.

Spivak•10h ago

The python dict() is already high-quality and high-performance. You're probably not going to be able to do much better than the current state. If you want faster you end up moving your data into an environment that lets you make more assumptions and be more restrictive with your data and operate on it there. It's how numpy, polaris, and pandas work.

Everything in Python is a dict, there's no data structure that's been given more attention.

https://m.youtube.com/watch?v=p33CVV29OG8

jamesdutc•9h ago

> The python dict() is already high-quality and high-performance

Yes, the CPython `PyDictObject` has been subject to a lot of optimisation work, and it is both high-quality and high-performance. I should not have implied that this is not the case.

However, there's a lot of ongoing research into even further improving the performance of hash tables, and there are regular posts discussing the nature of these kinds of improvements: e.g., https://news.ycombinator.com/item?id=17176713

I have colleagues who have wanted to improve the performance of their use of `dict` (within the parts of their code that are firmly within the structural/Python domain,) who have wanted to integrate these alternate implementations. For the most part, these implementations do not guarantee “human ordering” so this means that they can provide these tools only as supplements (and not replacements) of the Python built-in `dict`.

> moving your data into an environment that lets you make more assumptions and be more restrictive with your data and operate on it there. It's how numpy, polaris, and pandas work.

Yes, the idea of a heterogeneous topology of Python code, wherein “programme structuring” is done in pure Python, and “computation” is done in aggregate types that lower to C/C++/Rust/Zig (thus eliminating dynamic dispatch, ensuring contiguity, &c.) is common in Python. As you note, this is the pattern that we see with NumPy, Polars, pandas, and other tools. We might put a name to this pattern: the idea of a “restricted computation domain.” (I believe I introduced this terminology into wide-use within the Python community.)

However, not all code shows such a stark division between work that can be done at high-generality (and, correspondingly, low-performance) in pure Python and work that can be done at low-generality (but very high-performance) within a “restricted computation domain.) There are many types of problems wherein the line between these are blurred, and it is in this region where improvements to the performance of pure Python code may be desired.

rbanffy•6h ago

> they can provide these tools only as supplements (and not replacements) of the Python built-in `dict`.

Maybe they’ll walk back on that if there is a compelling new implementation that isn’t ordered. Or they keep dict as is and use the better implementation for internal dict-like structures.

gizmo686•10h ago

The problem is that once this side effect is in the main version for a while, it becomes a feature whether you admit it or not. Since that is going to happen anyway, you might as well make it official.

The alternative approach is to do what Go did and explicitly randomize iteration order to prevent people from relying on a fixed order in the first place.

jamesdutc•10h ago

> explicitly randomize iteration

In fact, we see this with CPython `set`, controlled by `_Py_HashSecret`:

https://github.com/python/cpython/blob/6eb6c5dbfb528bd07d77b...

jamesdutc•10h ago

This post contains a key misconception about the Python builtin data structures, that may seem like sophistry but is key to understanding the semantics (and, thus, most fluent use) of these tools.

All of the Python builtin data structures are ordered.

The distinction we should make is not between ordered and unordered data structures. Instead, we should distinguish between human ordered and machine ordered data structures.

In the former, the data structure maintains an ordering that a human being can used as part of their understanding of the programme. A `list` is human-ordered (and its order typically connotes “processing” order,) a `tuple` is human-ordered (and its order typically connotes “semantic” ordering, which is why `sorted(…)` and `reversed(…)` is rarely a meaningful operation,) a `str` is human-ordered, and `int` is ordered (if we consider `int` in Python to be a container type, despite our inability to easily iterate over its contents. Whether or not `complex` is a container or not, is pushing this idea a bit too far, in part because I don't think anyone really uses `complex`, since NumPy dtype='complex128' is likely to be far more useful in circumstances where we're working within .)

In the latter, the data structure maintains an ordering that a human being cannot use as part of their understanding of a programme (usually as a consequence of a mechanism that the machine uses as part of its execution of the programme.) A `set` is machine-ordered, not unordered. If we iterate over a `set` multiple times in a row, we see the same ordering (even though we cannot predict this ordering.) In fact, the ordering of a `set` is intentionally made difficult for a human being to predict or use, by means of hash “salting”/seeding (that can only be controlled externally via, e.g., the https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHON... `PYTHONHASHSEED` environment variable.)

Historically, the Python `dict` was machine ordered. If we looped over a `dict` multiple times in a row (without changes made in between,) we were guaranteed a consistent ordering. In fact, for `dict`, the guarantee of consistency in this ordering was actually useful: we were guaranteed that `[d]` and `[d.values()]` on a `dict` (with no intervening changes) would maintain the same correspondence order (thus `[zip(d, d.values())]` would match exactly to `[d.items()]`!)

When the split-table optimisation was added to Python, the Python `dict` became a very interesting structure. Note that, from a semantic perspective, there are actually two distinct uses of `dict` that we see in use: as a “structural” or as “data” entity. (Ordering is largely meaningless for the former, so we'll ignore it for this discussion.) When the split-table optimisation was added in Python, the underlying storage for the `dict` became two separate C-level blocks of contiguous memory, one of which was machine-ordered (in hash-subject-to-seeding-and-probing/perturbation order) and one of which was human-ordered (in insertion order.) (From this perspective, we could argue that a `dict` is both human and machine-ordered, though it stands to reason that the only useful artefact we see of the latter is with `__eq__` behaviour, which this article discusses. Since “human ordering” is a guarantee, it supersedes “machine ordering.”)

SQLite Date and Time Functions

In San Francisco, Waymo Has Now Bested Lyft. Uber Is Next

Why I Joined Doge

CRMArena-Pro: LLM Agents Assessed Across Diverse Business Scenarios

LLM Chat via SSH

A glimpse inside the world of drug repurposing (2024)

KnowBase- Turn Docs into Custom GPTs (Free,NoCode)- Uses Supabase DB and ChatGPT

Generate Liquid Glass UI Effects – Inspired by iOS 26

OxCaml is Jane Street's branch of OCaml

299,792,458: The Number That Connects Pyramids to Light Speed

'Nothing will be the same again': Portugal's Chega may be spot on

The State of React and the Community in 2025

Memoir: Lifelong Model Editing with Minimal Overwrite Informed Retention for LLM

DIYRE: DIY Audio Projects

Can you hear a 51% duty cycle

Institutional Books by Institutional Data Initiative

Green Tea Garbage Collector

What happened to Air India 171

For All That Is Good About Humankind, Ban Smartphones

Elicitation

Tailscale Founder Talks Future IPO as Revenue Surges on AI Adoption

On the Usability of Editable Software

Plotform – Product Hunt but for your book launches

$100 Hamburger

Pip's Quake

Exploring the Dangers of AI in Mental Health Care

Review: 'Print the Legend' gives form to 3-D printer companies' history (2014)

SIMD-friendly algorithms for substring searching

After 18 Years of Infertility, an AI Tool Let a Couple Conceive

Building a WordPress MCP server for Claude

SQLite Date and Time Functions

In San Francisco, Waymo Has Now Bested Lyft. Uber Is Next

Why I Joined Doge

CRMArena-Pro: LLM Agents Assessed Across Diverse Business Scenarios

LLM Chat via SSH

A glimpse inside the world of drug repurposing (2024)

KnowBase- Turn Docs into Custom GPTs (Free,NoCode)- Uses Supabase DB and ChatGPT

Generate Liquid Glass UI Effects – Inspired by iOS 26

OxCaml is Jane Street's branch of OCaml

299,792,458: The Number That Connects Pyramids to Light Speed

'Nothing will be the same again': Portugal's Chega may be spot on

The State of React and the Community in 2025

Memoir: Lifelong Model Editing with Minimal Overwrite Informed Retention for LLM

DIYRE: DIY Audio Projects

Can you hear a 51% duty cycle

Institutional Books by Institutional Data Initiative

Green Tea Garbage Collector

What happened to Air India 171

For All That Is Good About Humankind, Ban Smartphones

Elicitation

Tailscale Founder Talks Future IPO as Revenue Surges on AI Adoption

On the Usability of Editable Software

Plotform – Product Hunt but for your book launches

$100 Hamburger

Pip's Quake

Exploring the Dangers of AI in Mental Health Care

Review: 'Print the Legend' gives form to 3-D printer companies' history (2014)

SIMD-friendly algorithms for substring searching

After 18 Years of Infertility, an AI Tool Let a Couple Conceive

Building a WordPress MCP server for Claude

Are Python Dictionaries Ordered Data Structures?

Comments