frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
1•PaulHoule•3m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•4m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•5m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
1•Brajeshwar•5m ago•0 comments

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•6m ago•1 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•6m ago•0 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
4•c420•7m ago•0 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•7m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
1•HotGarbage•8m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•8m ago•1 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•9m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
3•surprisetalk•13m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
3•TheCraiggers•14m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•15m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
7•doener•15m ago•2 comments

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

https://github.com/vgrippa/myflames
1•tanelpoder•16m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•17m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
3•tanelpoder•18m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•18m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
2•elsewhen•22m ago•0 comments

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•23m ago•0 comments

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•26m ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
2•mooreds•27m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•27m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•27m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•28m ago•0 comments

What AI is good for, according to developers

https://github.blog/ai-and-ml/generative-ai/what-ai-is-actually-good-for-according-to-developers/
1•mooreds•28m ago•0 comments

OpenAI might pivot to the "most addictive digital friend" or face extinction

https://twitter.com/lebed2045/status/2020184853271167186
1•lebed2045•29m ago•2 comments

Show HN: Know how your SaaS is doing in 30 seconds

https://anypanel.io
1•dasfelix•29m ago•0 comments

ClawdBot Ordered Me Lunch

https://nickalexander.org/drafts/auto-sandwich.html
3•nick007•30m ago•0 comments
Open in hackernews

Performance Debugging with LLVM-mca: Simulating the CPU

https://johnnysswlab.com/performance-debugging-with-llvm-mca-simulating-the-cpu/
33•signa11•7mo ago

Comments

camel-cdr•7mo ago
One thing to keep in mind with llvm-mca is that not all processors use their own scheduling model and different scheduling models are more or less accurate.

E.g. Cortex-A72 uses the Cortex-A57 model, as does Cortex-A76, even Cortex-A78.

The neoverse V1 model has an issue width of 15, meanwhile the neoverse V2 (and V3, which uses V2) has an issue width of 6.

MobiusHorizons•7mo ago
Are you saying the model used to simulate many different cpu models is the same, which makes comparing CPUs harder? Or are you saying the model is not accurate?

It’s an interesting point that the newer neoverse cores use a model with smaller issue width. Are you saying this doesn’t match reality? If so do you have any idea why they model it that way?

camel-cdr•7mo ago
> Are you saying the model used to simulate many different cpu models is the same, which makes comparing CPUs harder? Or are you saying the model is not accurate?

Both, but mostly the former. You can view the scheduling models used for a given CPU here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...

    * CortexA53Model used for: A34, A35, A320, a53, a65, a65ae
    * CortexA55Model used for: A55, r82, r82ae
    * CortexA510Model used for: a510, a520, a520ae
    * CortexA57Model used for: A57, A72, A73, A75, A76, A76ae, A77, A76, A78ae, A78c
    * NeoverseN2Model used for: a710, a715, a720, a720ae, neoverse-n2
    * NeoverseV1Model used for: X1, X1c, neoverse-v1/512tvb
    * NeoverseV2Model used for: X2, X3, X4, X295, grace, neoverse-v2/3/v3ae
    * NeoverseN3Model used for: neoverse-n3
It's even worse for Apple CPUs, all apple CPUs, from apple-a7 to apple-m4 use the same "CycloneModel" of a 6-issue out-of-order core from 2013.

There are more fine-grained target-specific feature flags used, e.g. for fusion, but the base scheduling model often isn't remotely close to the actual processor.

> It’s an interesting point that the newer neoverse cores use a model with smaller issue width. Are you saying this doesn’t match reality? If so do you have any idea why they model it that way?

Yes, I opened an issue about the Neoverse cores since then an independent PR adjusted the V2 down from 16 wide to a more realistic 8 wide: https://github.com/llvm/llvm-project/issues/136374

Part of the problem is that LLVMs scheduling model can't represent all properties of the CPU.

The issue width for those cores seems to be set to the maximum number of uops the core can execute at once. If you look at the Neoverse V1 micro architecture, it indeed has 15 independent issue ports: https://en.wikichip.org/w/images/2/28/neoverse_v1_block_diag...

But notice how it can only decode 8 instructions (5 if you exclude MOP cache) per cycle. This is partially because some operations take multiple cycles before the port can execute new instructions, so having more execution ports is still a gain in practice. The other reason is uop cracking. Complex addressing modes and things like load/store pairs are cracked into multiple uops, which execute on separate ports.

The problem is that LLVMs IssueWidth parameter is used to model, decode and issue width. The execution port count is derived from the ports specified in the scheduling model itself, which basically are correct.

---

The reason for all of this is, if I had to guess, that modeling instruction scheduling doesn't matter all that much for codegen on OoO cores. The other one is that just putting in the "real"/theoretical numbers doesn't automatically result in the best codegen.

It does matter, however, if you use it to visualize how a core would execute instructions.

The main point I want to make, is that you shouldn't use llvm-mca with -mcpu=apple-m4 and use it to compare against -mcpu=znver5 and expect any reasonable answers. Just be sure to check the source, so you realize you are actually comparing a scheduling model based on the apple Cyclone (2013) core and the Zen4 core (2022).

mshockwave•7mo ago
> that modeling instruction scheduling doesn't matter all that much for codegen on OoO cores.

yeah scheduling quality usually has a weaker connection to the performance of OoO cores. Though I would also like to point out:

  1. in-order cores still heavily relies on scheduling quality
  2. Issue width is actually a big thing in MachineScheduler regardless of in-order or out-of-order cores. So the problem you outlined above w.r.t different implementations of uops cracking is indeed quite relevant
  3. MachineScheduler does not use the BufferSize -- which more or less mirrors the issue queue size of each pipe -- at all for out-of-order core. MicroOpBufferSize, which models the unified reservation station / ROB size, only got used in a really specific place. However, these parameters matter (much) more for llvm-mca
camel-cdr•7mo ago
@dang The website shows this comment as written 50 minutes ago, but I wrote it over a day ago.
dzaima•7mo ago
The timestamps just get moved around sometimes: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
tomhow•7mo ago
Yes, as dzaima wrote, the displayed timestamps on comments are adjusted if a submission has grace time added, which happens when we put a story in the second chance pool [1]. This is because the time-since-posting is a signifiant factor in the gravity calculation that pulls a submission's ranking down over time.

I know it's confusing, but if we left all the comment timestamps as seeming much older than the submission, then it would be even more confusing to other readers. (That said, I generally try to avoid doing this, given the confusion it causes).

[1] https://news.ycombinator.com/item?id=26998308

MobiusHorizons•7mo ago
Thanks for elaborating, this was very instructive!
pornel•7mo ago
The tool has a great potential, but I always found it too limited, fiddly, or imprecise when I needed to optimize some code.

It only supports consecutive instructions in the innermost loops. It can't include nor even ignore any setup/teardown cost. This means I can't feed any function as-is (even a tiny one). I need to manually cut out the loop body.

It doesn't support branches at all. I know it's a very hard problem, but that's the problem I have. Quite often I'd like to compare branchless vs branchy versions of an algorithm. I have to manually remove branches that I think are predictable and hope that doesn't alter the analysis.

It's not designed to compare between different versions of code, so I need to manually rescale the metrics to compare them (different versions of the loop can be unrolled different number of times, or process different amount of elements per iteration, etc.).

Overall that's laborious, and doesn't work well when I want to tweak the high-level C or Rust code to get the best-optimizing version.

mshockwave•7mo ago
> This means I can't feed any function as-is (even a tiny one). I need to manually cut out the loop body.

> It doesn't support branches at all. I know it's a very hard problem, but that's the problem I have

Shameless self-plug: https://github.com/securesystemslab/LLVM-MCA-Daemon

fschutze•7mo ago
Can you provide a bit more context why the MCA-Daemon is preferred? Looks interesting, but I don't fully get it.
fossa1•7mo ago
This is a textbook case of micro-architectural reality beats theoretical elegance. It's fascinating how replacing 5 loads with 2 loads + 3 vextq_f32 intrinsics, which should reduce memory pressure, ends up being slower due to execution port contention and dependency chains.
almostgotcaught•7mo ago
> uses information available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific CPU

do people not realize that the scheduling models in LLVM are approximate? like really approximate sometimes. in fact, half the job of working on instruction scheduling in LLVM is cajoling the scheduler into doing the right thing given the approximate models.

Sesse__•7mo ago
My favorite was when the uiCA people found that a toy model (counting instructions and loads, then multiplying them by some simple constants) significantly outperformed llvm-mca on x86 :-)