frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
1•fainir•47s ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•1m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•3m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•8m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
2•Brajeshwar•8m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
1•Brajeshwar•8m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•11m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•14m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•15m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•15m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•16m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•21m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•26m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•30m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•31m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•32m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•39m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•41m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•42m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•43m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•44m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•44m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•45m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•45m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•49m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•49m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•50m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•50m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•59m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•59m ago•0 comments
Open in hackernews

Strategies for Fast Lexers

https://xnacly.me/posts/2025/fast-lexer-strategies/
180•xnacly•6mo ago

Comments

skeptrune•6mo ago
I really appreciate the commitment to bench-marking in this one. The memoization speedup for number processing was particularly surprising.
felineflock•6mo ago
Are you referring to the part where he said "crazy 15ms/64% faster" ?
duped•6mo ago
Do you have benchmarks that show the hand rolled jump table has a significant impact?

The only reason this raises an eyebrow is that I've seen conflicting anec/data on this, depending pretty hard on target microarchitecture and the program itself.

xnacly•6mo ago
Sadly that was one of the things I did benchmark, but didn't write down the results. I read a lot that naive switch is faster because the compiler knows how to optimise them better, but for my architecture and benchmarks the computed gotos were faster
cratermoon•6mo ago
... written in C.

Not sure how many of these translate to other languages.

xnacly•6mo ago
Most of them, jump tables work in rust, mmapping too. deferred numeric parsing, keeping allocations to a minimum, string slices, interning and inline hashing all work in rust, go, c, c++; you name it.
thechao•6mo ago
I like to have my lexers operate on `FILE*`, rather than string-views. This has some real-world performance implications (not good ones); but, it does mean I can operate on streams. If the user has a c-string, the string can be easily wrapped by `funopen()` or `fopencookie()` to provide a `FILE*` adapter layer. (Most of my lexers include one of these, out-of-the-box.)

Everything else, I stole from Bob Nystrom: I keep a local copy of the token's string in the token, aka, `char word[64]`. I try to minimize "decision making" during lexing. Really, at the consumption point we're only interested in an extremely small number of things: (1) does the lexeme start with a letter or a number?; (2) is it whitespace, and is that whitespace a new line?; or, (3) does it look like an operator?

The only place where I've ever considered goto-threading was in keyword identification. However, if your language keeps keywords to ≤ 8 bytes, you can just bake the keywords into `uint64_t`'s and compare against those values. You can do a crapload of 64b compares/ns.

The next level up (parsing) is slow enough to eat & memoize the decision making of the lexer; and, materially, it doesn't complicate the parser. (In fact: there's a lot of decision making that happens in the parser that'd have to be replicated in the lexer, otherwise.)

The result, overall, is you can have a pretty general-purpose lexer that you can reuse for a any old C-ish language, and tune to your heart's content, without needing a custom rewrite, each time.

tempodox•6mo ago
The tragic thing is that you can't do `fgetwc()` on a `FILE *` produced by `fopencookie()` on Linux. glibc will crash your program deliberately as soon as there is a non-ASCII char in that stream (because, reasons?). But it does work with `funopen()` on a BSD, like macOS. I'm using that to read wide characters from UTF-8 streams.
o11c•6mo ago
Wide characters are best avoided even on platforms where it doesn't mean UTF-16. It's better to stay in UTF-8 mode, and only verify that it's well-formed.
tempodox•6mo ago
But at some point you'll want to know whether that code point you read `iswalpha()` or whatever, so you'll have to decode UTF-8 anyway.
thechao•6mo ago
At the parser-level, though; not down in the lexer. I intern unique user-defined strings (just with a hashcons or whatever the cool kids call it, these days). That defers the determination of correctness of UTF-kness to "someone else".
tempodox•6mo ago
Figuring out whether a character should become part of a number or a name, for instance, is typical lexer stuff though. For that you have to classify it.
o11c•6mo ago
Have you considered making your lexer operate in push mode instead?

This does mean you have to worry about partial tokens ... but if you limit yourself to feeding full lines that mostly goes away.

Besides, for reasonable-size workloads, "read the whole file ahead of time" is usually a win. The only time it's tempting not to do so is for REPLs.

thechao•6mo ago
I agree. But, I also like the discipline of lexing from `FILE*`. I've ended up with cleaner separation of concerns throughout the front-end stack, because I can't dip back into the well, unless I'm thinking very clearly about that operation. For instance, I keep around coordinates of things, rather than pointers, etc.
codr7•6mo ago
I'd do this in almost any other language than C :)

In C, I like just passing a const char * around as input; this also gives me ability to return progress and unget chars as an added bonus.

https://github.com/codr7/shi-c/blob/b1d5cb718b7eb166a0a93c77...

teo_zero•6mo ago
> I like to have my lexers operate on `FILE*`, rather than string-views. [...] it does mean I can operate on streams.

While I understand the desire to support one input interface for composability, reuse, etc. I can't help wondering why 'FILE*'. Isn't reading from a string more "universal"?

> If the user has a c-string, the string can be easily wrapped by `funopen()` or `fopencookie()` to provide a `FILE*` adapter layer.

And if the user has a file, it's easy to read it into memory in advance.

What's the benefit of FILE* over a string?

trealira•6mo ago
Perhaps it's that you never have to read the whole file into memory at once if it's with a `FILE *` rather than a string. I'm not that person, this is just my assumption.
tlb•6mo ago
There was a time when a file of source code might not fit in memory, or would take up a significant fraction of it. But it hasn't been the case on any developer machine in 20+ years. And the overhead of FILE * accessors like fgetc is substantial. Strings in memory are always going to be faster.
viega•6mo ago
Well, the overhead of the stream API is in the noise. If the lexer / parser do not support incremental parsing, it doesn't really matter. But incremental parsing can be important in some situations. For instance, if you're parsing a 1GB json blob keeping the whole thing in memory at once can easily be an issue. Plus, if you stall waiting for the entire input string, you end up adding to latency, if that matters.
cyber_kinetist•6mo ago
You can just use virtual memory (mmap / VirtualAlloc) to map an address region with a file and get the same effect while just using char* pointers.
ummonk•6mo ago
Wait do modern compilers not use jump tables for large switch statements?
packetlost•6mo ago
Some definitely do.
skybrian•6mo ago
This is fun and all, but I wonder what’s the largest program that’s ever been written in this new language (purple garden)? Seems like it will be a while before the optimizations pay off.
xnacly•6mo ago
I havent written a lot, but it needs to be fast so i can be motivated to program more by the fast iteration
sparkie•6mo ago
As an alternative to the computed gotos, you can use regular functions with the `[[musttail]]` attribute in Clang or GCC to achieve basically the same thing - the call in the tail position is replaced with a `jmp` instruction to the next function rather than to the label, and stack usage remains constant because the current frame is reutililzed for the called function. `musttail` requires that the calling function and callee have the same signature, and a prototype.

You'd replace the JUMP_TARGET macro:

    #define JUMP_TARGET goto *jump_table[(int32_t)l->input.p[l->pos]]
With:

    #ifdef __clang__
    #define musttail [[clang::musttail]]
    #elif __GNUC__
    #define musttail [[gnu::musttail]]
    #else
    #define musttail
    #endif
    #define JUMP_TARGET return musttail jump_table[(int32_t)l->input.p[l->pos]](l, a, out)
Then move the jump table out to the top level and replace each `&&` with `&`.

See diff (untested): https://www.diffchecker.com/V4yH3EyF/

This approach has the advantage that it will work everywhere and not only on compilers that support the computed gotos - it just won't optimize it on compilers that don't support `musttail`. (Though it has been proposed to standardize it in a future version of C).

It might also work better with code navigation tools that show functions, but not labels, and enables modularity as we can split rules over multiple translation units.

Performance wise should basically be the same - though it's been argued that it may do better in some cases because the compiler's register allocator doesn't do a great job in large functions with computed gotos - whereas in musttail approach each function is a smaller unit and optimized separately.

bestouff•6mo ago
Can't wait for mandatory TCO coming to Rust. But it's not there yet. https://github.com/phi-go/rfcs/blob/guaranteed-tco/text/0000...
sparkie•6mo ago
Not sure I like the `become` keyword. Seems bizarre - someone encountering this word in code for the first time would have no idea what it's doing.

Why don't they just use `tailcall`? That would make it's obvious what it's doing because we've been using the term for nearly half a century, and the entire literature on the subject uses the term "tail call".

Even better would be to just automatically insert a tail call - like every other language that has supported tail calls for decades - provided the callee has the same signature as the caller. If it's undesirable because we want a stack trace, then instead have some keyword or attribute to suppress the tail call - such as `no_tail`, `nontail` or `donttail`.

Requiring tail calls to be marked will basically mean the optimization will be underutilized. Other than having a stack trace for debugging, there's basically no reason not to have the optimization on by default.

kibwen•6mo ago
Rust does allow tail call optimization. But that's LLVM's decision to optimize tail calls on a case-by-case basis. An explicit syntax to denote tail calls would be the difference between tail call optimization and guaranteed tall call elimination, which is important because if you're writing a tail-recursive function then it's pretty trivial to blow the stack at any moderate recursion depth unless you can guarantee the elimination.

As for why it's not trivial for Rust to do this by default, consider the question of what should happen in the case of local destructors, which in an ordinary function would be called after `return myfunc()` returns, but in a tail-recursive function would need to be called beforehand. The proposals for `become` tend to handle this by making it a compiler error to have any locals with destructors in scope at the point of the tail-call, further motivating the explicit syntax.

burnt-resistor•6mo ago
I looked into it. There's a crate for it: https://docs.rs/tailcall
celeritascelery•6mo ago
I am surprised this works reliably. I am interested to see what they are doing under the hood to guarantee tail call elimination.
nixpulvis•6mo ago
I'm generally pretty conservative about keywords. But it changes the semantics of the return, so it makes sense to change the word used in that position.
andrewflnr•6mo ago
As far as the name of the keyword: anyone who knows what "tail call" means will figure out "become" pretty quickly. If they don't get it from context clues, someone will just have to tell them, "oh, it's a tail call", and the confusion will dissolve, because "become" is really not a bad word for what happens in a tail call. (This is obviously less important than the implementation issues kibwen handled.)
lmm•6mo ago
> Even better would be to just automatically insert a tail call - like every other language that has supported tail calls for decades - provided the callee has the same signature as the caller.

I've used Scala for many years and concluded this was a mistake. It makes it too easy to accidentally rely on an optimisation and then break your code much later with a seemingly innocuous change. Better to make it explicit.

Quekid5•6mo ago
Just add the @tailrec annotation -- then the compiler will complain loudly if you break tail calls.
lmm•6mo ago
Yes, the problem is if you didn't add it but the compiler was still silently TCOing your function, until one day it doesn't.
Quekid5•6mo ago
Sure, but if you're relying on the TCO, you kind of have to actually state that, regardless? (At least if you want to avoid accidental regressions.)

I don't see any way around having to state your intent (as a programmer) for this. It's just a semantic difference in behavior unless your Abstract Machine allows for just ignoring the possibility of stack overflow entirely.

lmm•6mo ago
> I don't see any way around having to state your intent (as a programmer) for this.

I want a way to state that my intent is to not rely on the TCO. Even a second explicit annotation for "do not TCO this function" would be better than what Scala currently does (but frankly TCO is surprising enough to debug that I think it should be opt-in rather than opt-out).

Quekid5•6mo ago
> I want a way to state that my intent is to not rely on the TCO.

How would that work? You want a nontail keyword to indicate that intent explicitly?

I guess I could imagine a scenario where you want to de-optimize a TCO into a non-TC... but I mean... that's got to be rare enough to just not bother with?

EDIT: Also, this is the exact opposite of your comment which I replied to. Make up your mind on what you want and we can maybe find a way to achieve it

lmm•6mo ago
> How would that work? You want a nontail keyword to indicate that intent explicitly?

Either that, or to not have TCO applied if I don't set a keyword (which I'd prefer).

> I guess I could imagine a scenario where you want to de-optimize a TCO into a non-TC... but I mean... that's got to be rare enough to just not bother with?

Sometimes I know that a given function will eventually need to be non-TC, in which case I want a way to make it non-TCO now. More commonly a junior team member just hasn't thought about TC-ness at all, in which case I'd rather the function not be TCOed and fail-fast than be TCOed until it isn't.

> EDIT: Also, this is the exact opposite of your comment which I replied to.

No it isn't. It's the detail of how it happens. If you use Scala at scale with a team that includes non-experts it will happen to you.

dfawcus•6mo ago
FWIW, Alef also used a 'become' keyword for a tail-call version of 'return'.

That in a quite C-like language.

teo_zero•6mo ago
On compilers that don't support musttail, won't this make the stack explode?
gsliepen•6mo ago
AFAIK compilers will perform tail call optimization without [[musttail]], it's just not guaranteed (and probably it won't if you don't enable optimizations at all).
Sesse__•6mo ago
As an alternative to the computed gotos, you can use switch/case in Clang or GCC to achieve basically the same thing. :-) It becomes a jump table in most cases. (The article claims that a jump table gives smaller code and fewer branch misses, but it doesn't actually give any numbers, and enough things in there are dubious enough that I'm not convinced they ever measured.)

https://blog.nelhage.com/post/cpython-tail-call/ has made the rounds a lot recently, and explores this for Python's bytecode interpreter.

sparkie•6mo ago
The switch misses the point. The compiler isn't smart enough to convert it to direct-threading, to the best of my knowledge.

A switch only selects on one character. To continue lexing you need the switch inside a loop. The compiler might optimize the switch itself to a jump table - but what does each case do - jumps back to the start of the loop, after which it enters the jump table again. There are two branches involved.

The point of direct threading is that there is no loop - you simply jump directly to the handler for the next character at the end of each handler.

Sesse__•6mo ago
> The compiler isn't smart enough to convert it to direct-threading, to the best of my knowledge.

If you read the URL I linked to, you will see that it is.

> The point of direct threading is that there is no loop - you simply jump directly to the handler for the next character at the end of each handler.

No, the point of direct threading is that you give the branch predictor more context to work with (effectively, the previous opcode), which was relevant with the branch predictors in typical CPUs 10+ years ago. (Modern ones, ITTAGE-style, have per-branch history also for indirect jumps.)

bostick•6mo ago
FYI, in my opinion, clang `[[musttail]]` is not quite ready for prime time. (cannot speak to GCC)

I was excited when it was introduced but quickly ran into issues.

Here is a silent miscompilation involving `[[musttail]]` that I reported in 2022 and is still open: https://github.com/llvm/llvm-project/issues/56435

norir•6mo ago
Lexing being the major performance bottleneck in a compiler is a great problem to have.
norskeld•6mo ago
Is lexing ever a bottleneck though? Even if you push for lexing and parsing 10M lines/second [1], I'd argue that semantic analysis and codegen (for AOT-compiled languages) will dominate the timings.

That said, there's no reason not to squeeze every bit of performance out of it!

[1]: In this talk about the Carbon language, Chandler Carruth shows and explains some goals/challenges regarding performance: https://youtu.be/ZI198eFghJk?t=1462

munificent•6mo ago
It depends a lot on the language.

For a statically typed language, it's very unlikely that the lexer shows up as a bottleneck. Compilation time will likely be dominated by semantic analysis, type checking, and code generation.

For a dynamically typed language where there isn't as much for the compiler to do, then the lexer might be a more noticeable chunk of compile times. As one of the V8 folks pointed out to me years ago, the lexer is the only part of the compiler that has to operate on every single individual byte of input. Everything else gets the luxury of greater granularity, so the lexer can be worth optimizing.

norskeld•6mo ago
Ah, yes, that's totally fair. In case of JS (in browsers) it's sort of a big deal, I suppose, even if scripts being loaded are not render-blocking: the faster you lex and parse source files, the faster page becomes interactive.

P.S. I absolutely loved "Crafting Interpreters" — thank you so much for writing it!

SnowflakeOnIce•6mo ago
A simple hello world in C++ can pull in dozens of megabytes of header files.

Years back I worked at a C++ shop with a big codebase (hundreds of millions of LOC when you included vendored dependencies). Compile times there were sometimes dominated by parsing speed! Now, I don't remember the exact breakdown of lexing vs parsing, but I did look at it under a profiler.

It's very easy in C++ projects to structure your code such that you inadvertently cause hundreds of megabytes of sources to be parsed by each single #include. In such a case, lexing and parsing costs can dominate build times. Precompiled headers help, but not enough...

adev_•6mo ago
> Now, I don't remember the exact breakdown of lexing vs parsing, but I did look at it under a profiler.

Lexing, parsing and even type checking are interleaved in most C++ compilers due to the ambiguous nature of many construct in the language.

It is very hard to profile only one of these in isolation. And even with compiler built-in instrumentation, the results are not very representative of the work done behind.

C++ compilers are amazing machines. They are blazing fast at parsing a language which is a nightmare of ambiguities. And they are like that mainly because how stupidly verbose and inefficient the C++ include system is.

SnowflakeOnIce•6mo ago
> Lexing, parsing and even type checking are interleaved in most C++ compilers due to the ambiguous nature of many construct in the language. > > It is very hard to profile only one of these in isolation. And even with compiler built-in instrumentation, the results are not very representative of the work done behind.

Indeed, precise cost attribution is difficult or impossible due to how the nature of the language imposes structure on industrial computers. But that aside, you still end up easily with hundreds of megabytes of source to deal with in each translation unit. I have so many scars from dealing with that...

viega•6mo ago
The computational complexity for lexing C++ is linear, but for parsing C++ it's super-linear, as will be many analyses. In practice, the lexing is in the noise for almost all compliers.
adev_•6mo ago
Even the lexing is not strictly linear.

Good luck to naively differentiate if '>' is effectively a chevron operator or a shift operator '>>' in a random nested template expression like : 'set<vector<int>>'.

burnt-resistor•6mo ago
Extending beyond the capabilities of PCHs, but there used to be incremental (C?) compilers/IDEs (maybe there still are?) that cached ASTs and were smart enough to invalidate and reparse just those portions of the local AST that changed based on editor updates. This was back when storage was very, very slow.
Phil_Latio•6mo ago
You have a point, but getting the easy part (the lexer/token) "right" is something to strive for, because it will also influence the memory and performance characteristics of later stages in the pipeline. So thinking (and writing) about this stuff makes sense.

Example: In the blog post a single token uses 32 bytes + 8 bytes for the pointer indirection in AST node. That's a lot of memory, cache line misses and indirections.

burnt-resistor•6mo ago
It might be worth throwing away the lexer entirely and using something like parsing with derivatives. Ultimately though, I suspect hand-rolled SIMD parsing is how to make a compiler even faster than either go or v. Maintenance would suck because each implementation would be completely architecture- and platform-specific.

PS: We really need GPSIMD compiler phases for GCC and Clang.

zX41ZdbW•6mo ago
I recommend taking a look at the ClickHouse SQL Lexer:

https://github.com/ClickHouse/ClickHouse/blob/master/src/Par...

https://github.com/ClickHouse/ClickHouse/blob/master/src/Par...

It supports SIMD for accelerated character matching, it does not do any allocations, and it is very small (compiles to a few KB of WASM code).

tuveson•6mo ago
How much of an improvement does SIMD offer for something like this? It looks like it's only being used for strings and comments, but I would kind of assume that for most programming languages, the proportion of code that is long strings / comments is not large. Also curious if there's any performance penalty for trying to do SIMD if most of the comments and strings are short.
camel-cdr•6mo ago
Usually lexing isn't part of the performance equation compared to all other parts of the compiler, but SIMD can be used to speedup the number parsing.
Sesse__•6mo ago
Random data point: Implementing SIMD for tokenizing identifiers sped up the Chromium CSS parser (as a whole, not just the tokenizer) by ~2–3%.
o11c•6mo ago
Unfortunately, operating a byte at a time means there's a hard limit on performance.

A truly performant lexer needs to jump ahead as far as possible. This likely involves SIMD (or SWAR) since unfortunately the C library fails to provide most of the important interfaces.

As an example that the C library can handle tolerably, while lexing a string, you should repeatedly call `strcspn(input, "\"\\\n")` to skip over chunks of ordinary characters, then only special-case the quote, backslash, newline and (implicit!) NUL after each jump. Be sure to correctly distinguish between an embedded NUL and the one you probably append to represent EOF (or, if streaming [which requires quite a bit more logic], end of current chunk).

Unfortunately, there's a decent chance your implementation of `strcspn` doesn't optimize for the possibility of small sets, and instead constructs a full 256-bit bitset. And even if it does, this strategy won't work for larger sets such as "all characters in an identifier" (you'd actually use `strspn` since this is positive), for which you'll want to take advantage of the characters being adjacent.

Edit: yikes, is this using a hash without checking for collisions?!?

dist1ll•6mo ago
You can get pretty far with a branch per byte, as long as the bulk of the work is done w/ SIMD (like character classification). But yeah, LUT lookup per byte is not recommended.
xnacly•6mo ago
You are somewhat right, I used tagging masks to differntiate between different types of atoms [1]. But yes, interning will be backed by a correct implementation of a hashmap with some collision handling in the future.

[1]: https://github.com/xNaCly/purple-garden/blob/master/cc.c#L76...

kingstnap•6mo ago
You can go pretty far processing one byte at a time in hardware. You just keep making the pipeline deeper and pushing the frequency. And then to combat dependent parsing you add speculative execution to avoid bubbles.

Eventually you land on recreating the modern cpu.

burnt-resistor•6mo ago
No, SIMD-optimized string ops are way, way faster.

https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-...

adev_•6mo ago
Cool exercise and thank you for the blog post.

I did a similar thing (for fun) for the tokenizer associated to a Swift derivates language written in C++.

My approach was however very different of yours:

- No macro, no ASM, just explicit vectorization using std.simd

- No hand rolled allocator. Just std::vector and SOA.

- No hashing for keyword. They are short. A single SIMD load / compare is often enough for a comparison

- All the lookup tables are compile time generated from the token list using constexpr to keep the code small and maintainable.

I was able to reach around 8 Mloc/s on server grade hardware, single core.

JonChesterfield•6mo ago
Byte at a time means not-fast but I suppose it's all relative. The benchmarks would benefit from a re2c version, I'd expect that to beat the computed goto one. Easier for the compiler to deal with, mostly.
zahlman•6mo ago
Is lexing really ever the bottleneck? Why focus effort here?
s3graham•6mo ago
simdjson is another project to look at for ideas.

I found it quite tricky to apply its ideas to the more general syntax for a programming language, but with a bunch of hacking and few subtle changes to the language itself, the performance difference over one-character-at-a-time was quite substantial (about 6-10x).

psanchez•6mo ago
The jump table is interesting, although I guess the performance of switch will be similar if properly optimized with the compiler, but would not be able to tell without trying. Also different compilers might take different approaches.

A few months ago I built a toy boolean expression parser as a weekend project. The main goal was simple: evaluate an expression and return true or false. It supported basic types like int, float, string, arrays, variables, and even custom operators.

The syntax and grammar were intentionally kept simple. I wanted the whole implementation to be self-contained and compact, something that could live in just a .h and .cc file. Single pass for lexing, parsing, and evaluation.

After having the first version working, I kind of challenged myself to make it faster and tried many things.

Once the first version was functional, I challenged myself to optimize it for speed. Here are some of the performance-related tricks I remember using:

  - No string allocations: used the input *str directly, relying on pointer manipulation instead of allocating memory for substrings.
  - Stateful parsing: maintained a parsing state structure passed by reference to avoid unnecessary copies or allocations.
  - Minimized allocations: tried to avoid heap allocations wherever possible. Some were unavoidable during evaluation, but I kept them to a minimum.
  - Branch prediction-friendly design: used lookup tables to assist with token identification (mapping the first character to token type and validating identifier characters).
  - Inline literal parsing: converted integer and float literals to their native values directly during lexing instead of deferring conversion to a later phase.
I think all the tricks are mentioned in the article already.

For what is worth, here is the project:

  https://github.com/pausan/tinyrulechecker
I used this expression to assess the performance on an Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz (launched Q3 2018):

  myfloat.eq(1.9999999) || myint.eq(32)

I know it is a simple expression and likely a larger expression would perform worse due to variables lookups, ... I could get a speed of 287MB/s or 142ns per evaluation (7M evaluations per second). I was gladly surprised to reach those speeds given that 1 evaluation is a full cycle of lexing, parsing and evaluating the expression itself.

The next step I thought was also to use SIMD for tokenizing, but not sure it would have helped a lot on the overall expression evaluation times, I seem to recall most of the time was spent on the parser or evaluation phases anyway, not the lexer.

It was a fun project.

aappleby•6mo ago
Lexing is almost never a bottleneck. I'd much rather see a "Strategies for Readable Lexers".
kklisura•6mo ago
> As introduced in the previous chapters, all identifers are hashed, thus we can also hash the known keywords at startup and make comparing them very fast.

One trick that postgres uses [1][2] is perfect hashing [3]. Since you know in advance what your keywords are, you can design such hashing functions that for each w(i) in list of i keywords W, h(w(i)) = i. It essentially means no collisions and it's O(i) for the memory requirement.

[1] https://github.com/postgres/postgres/blob/master/src/tools/P...

[2] https://github.com/postgres/postgres/blob/master/src/tools/g...

[3] https://en.wikipedia.org/wiki/Perfect_hash_function

socalgal2•6mo ago
I'm sorry I only skimmed but, how to do report line,col numbers for errors?
pkaye•6mo ago
What I've done in my own implementation is include the line and column number in the token.
socalgal2•6mo ago
Yea, that's what I assumed too but the article didn't include them and I assumed it was for speed reasons. I've run into this often where some idealized version doesn't take into account usage ergonomics. A compiler that can't tell me where an error is is not a useful compiler to me so if this lexer doesn't support that then it won't actually be a net positive in development speed for me. I fail compilation more often than I succeed.
anonymoushn•6mo ago
Well, when it's this fast already, there may not be much point in vectorizing it.

For integer parsing, this is a pretty cheap operation, so I wonder if it is worthwhile to try to avoid doing it many times. For double parsing, it is expensive if you require the bottom couple bits to be correct, so the approach in the blog post should create savings.

userbinator•6mo ago
Replacing the switch with an array index and a jump.

Compilers will compile switches to a branch tree, two-level jump table, or single-level jump table depending on density and optimisation options. If manually using a jump table is faster, you either have a terrible compiler or just haven't explored its optimisation settings enough.

teo_zero•6mo ago
But the compiler will always add the jump back to the top of the loop, so it's two jumps per cycle. Manual dispatch is one jump per cycle.
ptspts•6mo ago
The Str_to_double code in the article produces inaccurate results in the last few bits. (What is the use of parsing a double inaccurately?) Accurate parsing of a double is really tricky (and memory-hungry and slow). The strtod(3) function provided by a decent libc (such as glibc and musl, and also the FreeBSD libc) can do it correctly.
sixthDot•6mo ago
A problem I see with talking exclusively about lexing is that when you separate lexing from parsing you miss the point that is that a lexer is an iterator consumed by the parser.