What's inside:
GPU-accelerated rendering with WebGPU shaders, shadows, and goodies like silhouette edges and a special soft-light mode
Core operations run up to 1000x faster than the original PyMOL. Surface generation that used to send you on a coffee run now finishes the moment you hit the button
Full PyMOL selection algebra support — 95+ keywords, boolean logic, distance/expansion operators, slash-macros
Distance, angle, and dihedral measurements, atom labels — everything you need for structural analysis
Python API — from pymol_rs import cmd and you're right at home
PDB, mmCIF, BinaryCIF, SDF/MOL, MOL2, XYZ, GRO — read and write, automatic format detection, transparent gzip decompression
Scenes, movies, ray tracing — all on the GPU
Kabsch superposition, CE alignment, RMSD, DSS, symmetry across all 230 space groups
Out of PyMOL's 798 original settings, some number of them actually work. Nobody knows exactly how many, but it's definitely in the hundreds. Plus we've added new settings that the original never had — like per-chain surface generation
13 independent crates — if you're writing Rust, you can use just the selection parser, the file readers, or the full GUI. No monolith
dalke•57m ago
I looked at the SDF reader, since that's what I know best. I see a few things which look like they need revisiting.
Line 75 has 'if name == "$$$$" {return self.parse_molecule();}' This isn't correct. This means the record name is "$$$$" (if you are RDKit), or it means the record is in the wrong format (if you are the CTFile specification, which explicitly prohibits that).
Also, does Rust have tail recursion? If not, the recursive nature of the code makes me think parsing a file containing 1 million lines of the form "$$$$\n" would likely blow the stack.
In principle the version number test for V2000 or V3000 should look at the specific column numbers, and not presence somewhere in the line. Someone like me might place a "V3000" in the obsolete fields, with a "V2000" in the correct vvvvvv field. ;)
The "Skip to end of molecule" code will break on real-world datasets. One classic problem is a company which used "$", "$$", "$$$" and "$$$$" to indicate cost, stored as tag data like:
where the first "$$$$" is part of the data item, and the second "$$$$" is the end of the SD record. This ended up causing a problem when an SDF reader somewhere in their system didn't parse data items correctly. (Another common failure in data item parsing is to ignore the requirement for a newline after the data item.)I talk about "$$$$" more at http://www.dalkescientific.com/writings/diary/archive/2020/0... .
Then there's the "S SKP" field, which you'll almost certainly never see it in real life! I've only seen it used in a published example of a JICST extended MOLfile. See http://www.dalkescientific.com/writings/diary/archive/2020/0...
Please don't like these comments get you down! These details are hard to get, and not obvious. It took me years to learn the rare corner cases.
I also haven't done molviz since the 1990s, or used PyMol (I was VMD person), so can't say anything about the overall project. We started with GL, and had to port to OpenGL. :)