frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

The Future of Flatpak

https://lwn.net/Articles/1020571/
56•dxs•1h ago•8 comments

Show HN: Defuddle, an HTML-to-Markdown alternative to Readability

https://github.com/kepano/defuddle
110•kepano•3h ago•21 comments

Claude 4

https://www.anthropic.com/news/claude-4
1444•meetpateltech•8h ago•770 comments

That fractal that's been up on my wall for years

https://chriskw.xyz/2025/05/21/Fractal/
297•chriskw•9h ago•19 comments

32 Bits That Changed Microprocessor Design

https://spectrum.ieee.org/bellmac-32-ieee-milestone
18•mdp2021•2h ago•2 comments

Airport for DuckDB

https://airport.query.farm/
33•jonbaer•3d ago•2 comments

Does Earth have two high-tide bulges on opposite sides? (2014)

http://physics.stackexchange.com/questions/121830/does-earth-really-have-two-high-tide-bulges-on-opposite-sides
121•imurray•6h ago•40 comments

The Copilot Delusion

https://deplet.ing/the-copilot-delusion/
103•isaiahwp•1h ago•68 comments

“Secret Mall Apartment,” a Protest for Place

https://modernagejournal.com/secret-mall-apartment-a-protest-for-place/251023/
39•rufus_foreman•2h ago•14 comments

Mozilla to shut down Pocket on July 8

https://support.mozilla.org/en-US/kb/future-of-pocket
761•phantomathkg•8h ago•487 comments

How to cheat at settlers by loading the dice (2017)

https://izbicki.me/blog/how-to-cheat-at-settlers-of-catan-by-loading-the-dice-and-prove-it-with-p-values.html
81•jxmorris12•6h ago•61 comments

Loading Pydantic models from JSON without running out of memory

https://pythonspeed.com/articles/pydantic-json-memory/
75•itamarst•7h ago•27 comments

Improving performance of rav1d video decoder

https://ohadravid.github.io/posts/2025-05-rav1d-faster/
247•todsacerdoti•13h ago•84 comments

A Scientist Fighting Nuclear Armageddon Hid a 50-Year Secret

https://www.nytimes.com/2025/05/19/science/richard-garwin-hydrogen-bomb.html
20•LAsteNERD•3d ago•2 comments

Fast Allocations in Ruby 3.5

https://railsatscale.com/2025-05-21-fast-allocations-in-ruby-3-5/
174•tekknolagi•11h ago•42 comments

Stargate and the AI Industrial Revolution

https://davefriedman.substack.com/p/stargate-and-the-ai-industrial-revolution
13•mhb•2h ago•9 comments

Trade Secrecy in Willy Wonka's Chocolate Factory (2009)

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1430463
26•NaOH•4h ago•4 comments

Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts

55•digitaltzar•9h ago•51 comments

I Built My Own Audio Player

https://nexo.sh/posts/why-i-built-a-native-mp3-player-in-swiftui/
168•nexo-v1•11h ago•87 comments

A South Korean grand master on the art of the perfect soy sauce

https://www.theguardian.com/world/2025/may/21/without-time-there-is-no-flavour-a-south-korean-grand-master-on-the-art-of-the-perfect-soy-sauce
120•n1b0m•1d ago•82 comments

Planetfall

https://somethingaboutmaps.wordpress.com/2025/05/20/planetfall/
310•milliams•16h ago•85 comments

Problems in AI alignment: A scale model

https://muldoon.cloud/2025/05/22/alignment.html
26•hamburga•5h ago•3 comments

Sketchy Calendar

https://www.inkandswitch.com/ink/notes/sketchy-calendar/
8•surprisetalk•1h ago•0 comments

When good pseudorandom numbers go bad

https://blog.djnavarro.net/posts/2025-05-18_multivariate-normal-sampling-floating-point/
34•chewxy•3d ago•2 comments

Show HN: SQLite JavaScript - extend your database with JavaScript

https://github.com/sqliteai/sqlite-js
139•marcobambini•11h ago•43 comments

The Annotated Kolmogorov-Arnold Network (Kan)

https://alexzhang13.github.io/blog/2024/annotated-kan/
23•jxmorris12•4h ago•1 comments

Show HN: Hsdlib – A C Library for Vector Similarity with SIMD Acceleration

18•habedi0•3d ago•0 comments

Adventures in Symbolic Algebra with Model Context Protocol

https://www.stephendiehl.com/posts/computer_algebra_mcp/
85•freediver•11h ago•20 comments

Practicing graphical debugging using visualizations of the Hilbert curve

https://akkartik.name/debugUIs.html
21•akkartik•6h ago•1 comments

Four years of sight reading practice

https://sandrock.co.za/carl/2025/05/four-years-of-sight-reading-pracice/
139•chthonicdaemon•3d ago•66 comments
Open in hackernews

Loading Pydantic models from JSON without running out of memory

https://pythonspeed.com/articles/pydantic-json-memory/
74•itamarst•7h ago

Comments

thisguy47•6h ago
I'd like to see a comparison of ijson vs just `json.load(f)`. `ujson` would also be interesting to see.
itamarst•6h ago
For my PyCon 2025 talk I did this. Video isn't up yet, but slides are here: https://pythonspeed.com/pycon2025/slides/

The linked-from-original-article ijson article was the inspiration for the talk: https://pythonspeed.com/articles/json-memory-streaming/

fjasdfas•6h ago
So are there downsides to just always setting slots=True on all of my python data types?
itamarst•6h ago
You can't add extra attributes that weren't part of the original dataclass definition:

  >>> from dataclasses import dataclass
  >>> @dataclass
  ... class C: pass
  ... 
  >>> C().x = 1
  >>> @dataclass(slots=True)
  ... class D: pass
  ... 
  >>> D().x = 1
  Traceback (most recent call last):
    File "<python-input-4>", line 1, in <module>
      D().x = 1
      ^^^^^
  AttributeError: 'D' object has no attribute 'x' and no __dict__ for setting new attributes
Most of the time this is not a thing you actually need to do.
masklinn•5h ago
Also some of the introspection stops working e.g. vars().

If you're using dataclasses it's less of an issue because dataclasses.asdict.

monomial•4h ago
I rarely need to dynamically add attributes myself on dataclasses like this but unfortunately this also means things like `@cached_property` won't work because it can't internally cache the method result anywhere.
jmugan•5h ago
My problem isn't running out of memory; it's loading in a complex model where the fields are BaseModels and unions of BaseModels multiple levels deep. It doesn't load it all the way and leaves some of the deeper parts as dictionaries. I need like almost a parser to search the space of different loads. Anyone have any ideas for software that does that?
causasui•5h ago
You probably want to use Discriminated Unions https://docs.pydantic.dev/latest/concepts/unions/#discrimina...
jmugan•2h ago
Yeah, I'm doing that
enragedcacti•5h ago
The only reason I can think of for the behavior you are describing is if one of the unioned types at some level of the hierarchy is equivalent to Dict[str, Any]. My understanding is that Pydantic will explore every option provided recursively and raise a ValidationError if none match but will never just give up and hand you a partially validated object.

Are you able to share a snippet that reproduces what you're seeing?

jmugan•2h ago
That's an interesting idea. It's possible there's a Dict[str,Any] in there. And yeah, my assumption was that it tried everything recursively, but I just wasn't seeing that, and my LLM council said that it did not. But I'll check for a Dict[str,Any]. Unfortunately, I don't have a minimal example, but making one should be my next step.
enragedcacti•1h ago
One thing to watch out for while you debug is that the default 'smart' mode for union discrimination can be very unintuitive. As you can see in this example, an int vs a string can cause a different model to be chosen two layers up even though both are valid. You may have perfectly valid uses of Dict within your model that are being chosen in error because they result in less type coercion. left_to_right mode (or ideally discriminated unions if your data has easy discriminators) will be much more consistent.

    >>> class A(BaseModel):
    >>>     a: int
    >>> class B(BaseModel):
    >>>     b: A
    >>> class C(BaseModel):
    >>>     c: B | Dict[str, Any]

    >>> C.model_validate({'c':{'b':{'a':1}}})
    
    C(c=B(b=A(a=1)))

    >>> C.model_validate({'c':{'b':{'a':"1"}}})

    C(c={'b': {'a': '1'}})

    >>> class C(BaseModel):
    >>>     c: B | Dict[str, Any] = Field(union_mode='left_to_right')
    
    >>> C.model_validate({'c':{'b':{'a':"1"}}})

    C(c=B(b=A(a=1)))
cbcoutinho•5h ago
At some point, we have to admit we're asking too much from our tools.

I know nothing about your context, but in what context would a single model need to support so many permutations of a data structure? Just because software can, doesn't mean it should.

shakna•3h ago
Anything multi-tenant? There's a reason Salesforce is used for so many large organisations. The multi-nesting lets you account for all the descrepancies that come with scale.

Just tracking payments through multiple tax regions will explode the places where things need to be tweaked.

not_skynet•2h ago
going to shamelessly plug my own library here: https://github.com/mivanit/ZANJ

You can have nested dataclasses, as well as specify custom serializers/loaders for things which aren't natively supported by json.

jmugan•2h ago
Ah, but I need something JSON-based.
not_skynet•48m ago
It does allow dumping to/recovering from json, apologies if that isn't well documented.

Calling `x: str = json.dumps(MyClass(...).serialize())` will get you json you can recover to the original object, nested classes and custom types and all, with `MyClass.load(json.loads(x))`

m_ke•5h ago
Or just dump pydantic and use msgspec instead: https://jcristharif.com/msgspec/
itamarst•5h ago
msgspec is much more memory efficient out of the box, yes. Also quite fast.
mbb70•5h ago
A great feature of pydantic are the validation hooks that let you intercept serialization/deserialization of specific fields and augment behavior.

For example if you are querying a DB that returns a column as a JSON string, trivial with Pydantic to json parse the column are part of deser with an annotation.

Pydantic is definitely slower and not a 'zero cost abstraction', but you do get a lot for it.

jtmcivor•3h ago
One approach to do that in msgspec is described here https://github.com/jcrist/msgspec/issues/375#issuecomment-15...
zxilly•4h ago
Maybe using mmap would also save some memory, I'm not quite sure if this can be implemented in Python.
itamarst•4h ago
Once you switch to ijson it will not save any memory, no, because ijson essentially uses zero memory for the parsing. You're just left with the in-memory representation.
dgan•4h ago
i gave up on python dataclasses & json. Using protobufs object within the application itself. I also have a "...Mixin" class for almost every wire model, with extra methods

Automatic, statically typed deserialization is worth the trouble in my opinion

fidotron•1h ago
Having only recently encountered this, does anyone have any insight as to why it takes 2GB to handle a 100MB file?

This looks highly reminiscent (though not exactly the same, pedants) of why people used to get excited about using SAX instead of DOM for xml parsing.

itamarst•47m ago
I talk about this more explicitly in the PyCon talk (https://pythonspeed.com/pycon2025/slides/ - video soon) though that's not specifically about Pydantic, but basically:

1. Inefficient parser implementation. It's just... very easy to allocate way too much memory if you don't think about large-scale documents, and very difficult to measure. Common problem with many (but not all) JSON parsers.

2. CPython in-memory representation is large compared to compiled languages. So e.g. 4-digit integer is 5-6 bytes in JSON, 8 in Rust if you do i64, 25ish in CPython. An empty dictionary is 64 bytes.

deepsquirrelnet•17m ago
Alternatively, if you had to go with json, you could consider using jsonl. I think I’d start by evaluating whether this is a good application for json. I tend to only want to use it for small files. Binary formats are usually much better in this scenario.