The Impossible Optimization, and the Metaprogramming to Achieve It

https://verdagon.dev/blog/impossible-optimization

73•melodyogonna•3mo ago

Comments

Xcelerate•3mo ago

> Eliminate redundant matrix operations (like two transposes next to each other)

In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.

I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.

fragmede•3mo ago

Depending on the userbase of the site, simply checking for @gmail.com at the end, I'd bet, would result in a quick win, as well as restricting the username's alphabet to allowed Gmail characters.

The other optimization I'd guess at would be to async/thread/process the checking before and after the @ symbol, so they can run in parallel (ish). Extra cpu time, but speed > CPU cycle count for this benchmark.

embedding-shape•3mo ago

Tell us you used to work at Google, without telling us.

"simply do X" is such a programmer fallacy at this point I'm surprised we don't have a catchy name for it yet, together with a XKCD for making the point extra clear.

fragmede•3mo ago

Tell us you don't actually work with any Google engineers... blah blah blah

The trope is "At Google we..." and then casually mention "violating" the CAP theorum with Spanner or something.

It is simple, and I really do hope any first year CS student could extract a substring from a string. Have LLMs so atrophied our programming ability that extraction of a substring is considered evidence of a superior programmer?

gus_massa•3mo ago

[Rehashing an old comment]

In the math department, we had a Moodle the students in the first year of my university in Argentina.

When we started like 15 years ago, the emails of the students and TA were evenly split in 30% Gmail, 30% Yahoo!, 30% Hotmail and 10% others (very aproxímate numbers).

Now the students have like 80% Gmail, 10% Live/Outlook/Hotmail and 10% others/Yahoo. Some of the TA are much older, so perhaps "only" 50% use Gmail.

The difference is huge. I blame the mandatory gmail account for the cell phone.

So, checking only @gmail.com is too strict, but a first fast check for @gmail.com and later the complete regex may improve the speed a lot in the real word.

bee_rider•3mo ago

Maybe I am old, but I like to keep as much communication as possible going through the university email. It just feels more official somehow.

omnicognate•3mo ago

The language here is Mojo, which the article seems to assume you know and doesn't say enough for you to deduce until half way through and after multiple code examples. I don't know how you're supposed to know this as even the blog it's on is mostly about Vale. From the intro I was expecting it to be about C++.

totalperspectiv•3mo ago

The author works for Modular. He shared the write up on the Mojo Discord. I think Mojo users were the intended audience.

SuperV1234•3mo ago

https://github.com/hanickadot/compile-time-regular-expressio...

canucker2016•3mo ago

FYI: This is a C++ template version of compile time regex class.

A 54:47 presentation at CppCon 2018 is worth more than a thousand words...

see https://www.youtube.com/watch?v=QM3W36COnE4

followup CppCon 2019 video at https://www.youtube.com/watch?v=8dKWdJzPwHw

As the above github repo mentions, more info at https://www.compile-time.re/

taeric•3mo ago

Gave me a smile to see the shout out to LISP in there.

Reading this take on it, it feels like a JIT compiler could also accomplish a fair bit of this? I'm also reminded of the way a lot of older programs would generate tables during build time. I'm assuming that is still fairly common?

pfdietz•3mo ago

Yes, this is all straightforward with Lisp macros. Beyond that, you can call the compile function in Common Lisp and do all this at run time too.

BoingBoomTschak•3mo ago

In fact, there's https://github.com/telekons/one-more-re-nightmare for CL.

Archit3ch•3mo ago

> Mojo, D, Nim, and Zig can do it, and C++ as of C++20. There are likely some other languages that can do it, but these are the only ones that can truly run normal run-time code at compile time

Pretty sure Julia can do it.

jburgy•3mo ago

https://bur.gy/2022/05/27/what-makes-julia-delightful.html confirms your hunch

spectraldrift•3mo ago

Having never heard of mojo before, I found this article fascinating. It provides a great example of how a toy regex parser works and an excellent explanation of why vanilla regex tends to be slow. It also presents a novel solution: compiling the regex into regular code, which can then be optimized by the compiler.

convolvatron•3mo ago

this is literally how 'lex' works. the one written in 1987 by Vern Paxson.

jlokier•3mo ago

The original is 'lex', written in 1975 by Mike Lesk and Eric Schmidt.

Yes, that Eric Schmidt, CEO of Google.

1987 was the clone, 'flex' :-)

It did "compiling the regex into regular code, which can then be optimized by the compiler" before the C programming language as we know it was created. I think 'lex' was compiling regex to C before the C language even had 'struct' types, 'printf' or 'malloc'.

spectraldrift•3mo ago

So I'm only 40 years behind! It's amazing how early innovations like this seamlessly fade into the background and can be taken for granted by folks like myself.

abeppu•3mo ago

To tie this specific example to a larger framework: In scala land, Tiark Rompf's Lightweight Modular Staging system handled this class of metaprogramming elegantly, and the 'modular' part included support of multiple compilation targets. The idea was that one could incrementally define/extend DSLs that produce an IR, optimizations in that IR, and code generation for chunks of DSLs. Distinctions about stage are straight-forward type-signature changes. The worked example in this post is very similar to one of the tutorials for that system: https://scala-lms.github.io/tutorials/regex.html

Unfortunately, so far as I can tell:

- LMS has not been updated for years and never moved to scala 3. https://github.com/TiarkRompf/virtualization-lms-core

- LMS was written to also use "scala-virtualized" which is in a similar situation

There's a small project to attempt to support it with virtualization implemented in scala 3 macros, but it's missing some components: https://github.com/metareflection/scala3-lms?tab=readme-ov-f...

I'd love to see this fully working again.

aappleby•3mo ago

I did something similar to this with C++ templates - it's Parsing Expression Grammar based, so not full regex, but enough for a lot of tasks:

  using sign      = Atoms<'+', '-'>;
  using digit     = Range<'0', '9'>;
  using onenine   = Range<'1', '9'>;
  using digits    = Some<digit>;
  using integer   = Seq<Opt<Atom<'-'>>, Oneof<Seq<onenine, digits>, digit>>;
  using fraction  = Seq<Atom<'.'>, digits>;
  using exponent  = Seq<Atoms<'e', 'E'>, Opt<sign>, digits>;
  using number    = Seq<integer, Opt<fraction>, Opt<exponent>>;

and I've confirmed that it does all get inlined and optimized on -O3.

JSON parser example here - https://github.com/aappleby/matcheroni/blob/main/examples/js...

lisper•3mo ago

> can compilers really execute general code at compile-time?

Cue the smug Lisp weenies laughing quietly in the background.

cadamsdotcom•3mo ago

Seems like an optimization that could be applied quite generally - as the author mentions at the end there’s lots of places this could be used.

The problem with applying this technique generally is the amount of code generated. But what if you can optimize that too.. perhaps share the common parts of the AST between the copies of the code that are generated, and overlay the changes with some datastructure.

Panzerschrek•3mo ago

For several years I have written a dedicated compiler for regular expressions. You basically pass a regular expression and get an object file containing optimized matching code. It uses LLVM library internally to perform optimizations and machine code generation. It should be generally faster to compile compared to solutions involving constexpr-based metaprogramming.

I am surprised, that there is no programming language doing similar stuff - having regular expressions which are compiled as native code instead of just using a runtime library like PCRE2. Implementing this in C++ or Rust should be relatively easy.

lostmsu•3mo ago

.NET had this since 2.0 if not 1.0

Tiny C Compiler

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

The F Word

Show HN: A luma dependent chroma compression algorithm (image compression)

Start all of your commands with a comma (2009)

IBM Beam Spring: The Ultimate Retro Keyboard

Eigen: Building a Workspace

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Selection rather than prediction

The AI boom is causing shortages everywhere else

I write games in C (yes, C) (2016)

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Learning from context is harder than we thought

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience