frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Tiny C Compiler

https://bellard.org/tcc/
70•guerrilla•2h ago•26 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
155•valyala•6h ago•29 comments

The F Word

http://muratbuffalo.blogspot.com/2026/02/friction.html
84•zdw•3d ago•37 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
90•surprisetalk•5h ago•93 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
122•mellosouls•8h ago•249 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
868•klaussilveira•1d ago•266 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
161•AlexeyBrin•11h ago•29 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
117•vinhnx•9h ago•14 comments

Show HN: Browser based state machine simulator and visualizer

https://svylabs.github.io/smac-viz/
4•sridhar87•4d ago•2 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
39•randycupertino•1h ago•40 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
42•mltvc•1h ago•52 comments

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
24•mbitsnbites•3d ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
83•samasblack•8h ago•59 comments

LLMs as the new high level language

https://federicopereiro.com/llm-high/
28•swah•4d ago•30 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
74•thelok•7h ago•14 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
256•jesperordrup•16h ago•83 comments

Brookhaven Lab's RHIC concludes 25-year run with final collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
37•gnufx•4h ago•42 comments

I write games in C (yes, C) (2016)

https://jonathanwhiting.com/writing/blog/games_in_c/
157•valyala•6h ago•136 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
539•theblazehen•3d ago•197 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
42•momciloo•6h ago•5 comments

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

https://www.nytimes.com/2026/02/07/technology/washington-post-will-lewis.html
8•jbegley•23m ago•1 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
100•onurkanbkrc•10h ago•5 comments

Selection rather than prediction

https://voratiq.com/blog/selection-rather-than-prediction/
19•languid-photic•4d ago•5 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
220•1vuio0pswjnm7•12h ago•338 comments

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
58•josephcsible•3h ago•71 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
43•marklit•5d ago•6 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
281•alainrk•10h ago•462 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
129•videotopia•4d ago•42 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
53•rbanffy•4d ago•15 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
659•nar001•10h ago•287 comments
Open in hackernews

Parsing Chemistry

https://re.factorcode.org/2025/10/parsing-chemistry.html
54•kencausey•3mo ago

Comments

whitten•3mo ago
Does the SMILE (or Simplified Molecular Input Line Entry System) code have an EBNF definition ? https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Lin... Claims there is a context free grammar.
dalke•3mo ago
That's "SMILES".

Yes. Here is the yacc grammar for the SMILES parser in the RDKit. https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Smi...

There's also one from OpenSMILES at http://opensmiles.org/opensmiles.html#_grammar . It has a shift/reduce error (as I recall) that I was not competent enough to fix.

I prefer to parser almost completely in the lexer, with a small amount of lexer state to handle balanced parens, bracket atoms, and matching ring closures. See https://hg.sr.ht/~dalke/opensmiles-ragel and more specifically https://hg.sr.ht/~dalke/opensmiles-ragel/browse/opensmiles.r... .

dalke•3mo ago
Oh, I should have pointed out my Python lexer-driven parser at https://hg.sr.ht/~dalke/smiview/browse/smiview.py

The lexer: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...

The lexer state transitions: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...

dekhn•3mo ago
I wrote a very simple SMILES parser using pyparsing https://github.com/dakoner/smilesparser/tree/master I wouldn't say it's intended for production work, but it has been useful in situations where I didn't want to pull in rdkit.
dalke•3mo ago
I see you include the dot disconnect "." as part of the Bond definition.

You also define Chain as:

  Chain <<= pp.Group(pp.Optional(Bond) + pp.Or([Atom, RingClosure]))
I believe this means your grammar allows the invalid SMILES C=.N
the__alchemist•3mo ago
Note: There are two standardized formats for this called SMILES and SELFIES. SMILES is much better supported, but SELFIES is more robust. I'm integrating them into some bio and chem software I'm working on.

You can do things like look up, using PubChem's API, similar molecules etc to a SMILES string.

I believe most molecule editors can load and save SMILES.

dachrillz•3mo ago
What about inchi? Isn’t that a common way of describing molecules as well?
the__alchemist•3mo ago
Good point!
fred_tandemai•3mo ago
InChI isn't really meant to be used as a format to store 2D molecules say for rendering but rather serves as a unique descriptive chemical identifier. InChI has many flavors but the Standard InChI yields one unique identifier for multiple forms (tautomers) of the same molecule.
jugoetz•3mo ago
SMILES and SELFIES are molecular graph representations and aren't meant to solve the "parse this sum formula" problem.

SELFIES are for genAI. If you ask a VAE to generate SMILES, it will spit out some strings that are invalid - can't happen with SELFIES, that is the one application where they are robust.

dekhn•3mo ago
It's still being argued if you really need SELFIES, or if SMILES autoencoders can be trained to only generate valid molecules, or if generating invalid molecules is useful (I'm in camp SELFIES, but I also want better ways to represent and learn on graphical chemical structures, ratehr than serialized strings).
chermi•3mo ago
can you guys explain what makes SELFIES robust? I'd only heard of SMILES until this thread, but I have been out of this space for 10 years.
dekhn•3mo ago
Let me start with an example- some time ago I worked on a VAE that encoded and decoded SMILES strings. The idea is that you should be able to encode a SMILES into an embedding space, do all the normal things you would do in that space, and then convert the resulting embedding vector back to a valid molecule.

The VAE is trained with a very large number of valid SMILES strings, typically tokenized at the character level (so "C" is a token, and "Br" is "B" then "r"). I and others have observed that VAEs trained like this produce large number of embedding vectors that do not decode to valid SMILES strings- they have syntax errors, or perform chemical alchemy (personally, I saw the training set had Br (bromine) and Ca (Calcium), and the output molecules sometimes were Ba (barium) even though that's not in the original dataset at all.

There are other reasons why the tokenizer produces bad results- only about 1-10% of vectors decode to valid molecules. Invalid SMILES are mostly useless- they don't correspond to actual structures.

To respond to this, the SELFIES format makes a few changes so that it is effectively impossible to produce invalid SELFIES stringes when decoding a VAE. Among other things, tokenization matches the actual elements and so the model will only ever output valid elements.

I believe this is the SMILES paper that my own experiments were based on: https://arxiv.org/pdf/1610.02415 (see https://github.com/maxhodak/keras-molecules for an open source attempt at implementation)

And this is the paper introducing SELFIES: https://arxiv.org/abs/1905.13741 (open source packages for working with SELFIES, and some example training scripts https://github.com/aspuru-guzik-group/selfies see "Validity of Latent Space in VAE SMILES vs. SELFIES for more detail on the robustness).

BTW, as a side note: even though we put a bunch of effort into duplicating the original SMILES VAE, it was extremely slow to train and not very useful. Now you can just ask Gemini to write a full SELFIES VAE and train it in less than a day on a conventional GPU (thanks pytorch transformers!) to get a decent basic set of embeddings useful for exploring chemical space.

chermi•3mo ago
Thanks, that's very interesting! Naive question, but why couldn't you force a specific tokenization scheme on SMILES? Specifically, just one token per element? I understand SELFIES does more, but your example of Ba/Br made me wonder.
dekhn•3mo ago
I asked the authors of the original SMILES paper and they didn't have a good answer. I wrote a parser for SMILES so I could tokenize that way but never followed up, and eventually SELFIES was announced.
chermi•3mo ago
Thanks!
logifail•3mo ago
Does this do structural formulae too?

Was thinking of InChI[0] but on Googling SMILES and SELFIES I found this[1] talk, this[2] paper and my goodness I've been down a few rabbit holes since...

[0] https://en.wikipedia.org/wiki/International_Chemical_Identif... [1] https://www.inchi-trust.org/wp/wp-content/uploads/2019/12/18... [2] https://pubs.rsc.org/en/content/articlehtml/2022/dd/d1dd0001...

jugoetz•3mo ago
No, in Python you can use rdkit (https://github.com/rdkit/rdkit) for that
toast_x•3mo ago
this is insanely cool
Jaxan•3mo ago
… It is just a parser? Sure the parser is written very succinctly and that’s neat. But parser generators for other languages can do it similarly.
brilee•3mo ago
Does this handle, e.g., water of hydration CaSO4 . 2H2O? states of matter H2O(g)? does it preserve subunit information, as in (C6H5)CH2COOH? Writing a parser for basic formulae is such a tiny tiny part of the actual problem... deciding the scope of what you want to handle and how is the real problem
mwt•3mo ago
This code is jibberish to me, but it appears the target is just parsing how many atoms are in a molecule string of some representation. That's cool, but to do just about anything useful in chemistry we need the bond graph (and often more - bond orders stereochemistry, plus much more for biopolymers).
the__alchemist•3mo ago
That was my initial reaction too, but I suspect this is has utility in applications other than what you and I are looking for. From context, I gather this may be for thermodynamic arithmetic, or reaction product arithmetic.
mwt•3mo ago
I'd be really interested to know of anybody making money with those topics (and doesn't already have their own domain-specific practice for the problem)
fred_tandemai•3mo ago
Cheminformatics is such an example. Heavily used in computational drug discovery.
chermi•3mo ago
Computational biology/cheminformatics has probably been on the most frustrating investments pharma companies have made in the past 20 years. There's been waves of optimism with many hires, then a slump after reality doesn't match optimistic expections, and so on. This time it may actually be different, and I myself am in that camp. I'm particularly excited by the discoveries in sampling methods that aren't just molecular dynamics. And the cellular foundation models for pre-screening drug interactions - they aren't quite there yet, but give it time.
mwt•2mo ago
The cheminformatics I do (mostly drug discovery/biophysics) definitely requires bonds!