frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Hubble Is Getting Old. Should We Try to Save It?

https://www.thequantumcat.space/p/hubble-is-getting-old-should-we-try
1•LorenDB•2m ago•0 comments

Show HN: GrowAGardenStock – Real-time inventory and event tracker

https://growagardenstock.com
1•merso•5m ago•0 comments

Metaobject Protocols: Why we want them and what else they can do [pdf]

https://cseweb.ucsd.edu/%7Evahdat/papers/mop.pdf
2•PaulHoule•6m ago•0 comments

Rimworld Odyssey DLC will be released today

https://ludeon.com/blog/2025/06/announcing-odyssey-and-update-1-6/
1•dimastopel•6m ago•0 comments

Intermediaries in Network-Based Ecosystems

https://mydata.org/2025/07/04/intermediaries-in-network-based-ecosystems/
2•Bogdanp•6m ago•0 comments

Claude feeds you banana thanks to a MCP server connected to robot arms

https://github.com/phospho-app/phospho-mcp-server
1•bottomotto•7m ago•0 comments

Intel CPU owners might be crashing "because of the summer heat" says Firefox dev

https://www.pcguide.com/news/firefox-dev-says-intel-13th-14th-gen-cpu-owners-might-be-crashing-because-of-the-summer-heat/
2•muizelaar•8m ago•0 comments

We're Light-Years Away from True Artificial Intelligence, Says Martha Wells

https://www.scientificamerican.com/article/were-light-years-away-from-true-artificial-intelligence-says-murderbot/
2•sohkamyung•9m ago•0 comments

China's AI Dreams Lead to Data Centers in Once-Restive West

https://www.bloomberg.com/news/newsletters/2025-07-11/china-s-ai-dreams-lead-to-data-centers-in-once-restive-west
1•perihelions•13m ago•0 comments

As China Cleans Its City Rivers, Locals Begin to Paddle Back In

https://www.sixthtone.com/news/1017339
2•sohkamyung•16m ago•0 comments

Ask HN: Any simple, Git-based workflow tools developers enjoy using in 2025?

1•kimzhang•18m ago•0 comments

Mastra is now Apache 2.0 licensed

https://mastra.ai/blog/apache-license
2•codekarate•18m ago•0 comments

Comparing PyTorch code optimization Gemini vs. Claude

http://addxorrol.blogspot.com/2025/07/understand-neural-nets-better-post-5-of.html
1•tdullien•20m ago•0 comments

Cybersecurity: The Economic Benefits of GDPR

https://www.cnil.fr/en/cybersecurity-economic-benefits-gdpr
1•amarcheschi•21m ago•1 comments

From Farm to Data: Pittsboro's Agricultural Land Faces Tech Transformation

https://farmonaut.com/usa/pittsboro-farm-land-transforms-with-data-center-boom
1•Bluestein•22m ago•0 comments

OnlineClipboard – Your Cloud Clipboard for Effortless Copy and Paste

https://onlineclipboard.my
1•ychnlt•23m ago•0 comments

Clustered PostgreSQL

https://arch.dog/bark/clustered-postgresql
1•todsacerdoti•26m ago•0 comments

Show HN: Yoslm -- You Only Need a Smoll Language Model for Object Detection

https://jigsawstack.com/blog/object-detection
2•khurdula•29m ago•0 comments

TikTok prepares US app with its own algorithm and user data

https://www.reuters.com/world/china/tiktok-prepares-us-app-with-its-own-algorithm-user-data-2025-07-09/
1•simpleintheory•33m ago•0 comments

Thoughts on Motivation and My 40-Year Career

https://charity.wtf/2025/07/09/thoughts-on-motivation-and-my-40-year-career/
3•mooreds•35m ago•1 comments

The European Cloud/Computing Situation

https://berthub.eu/articles/posts/the-european-situation/
2•doener•38m ago•0 comments

Open Letter: my 2 year shuffle-versary and How It Ends Here

https://world.hey.com/tratt/open-letter-please-help-or-my-2-year-shuffle-versary-how-it-ends-here-b0d6971a
1•andytratt•39m ago•0 comments

Tencent Attempts to Silence FreeWeChat in Trademark Smokescreen Attack

https://en.greatfire.org/blog/2025/jul/tencent-attempts-silence-freewechat-trademark-smokescreen-attack
1•rntn•41m ago•0 comments

Is Subversion a Day Early? (2011)

https://svnbook.red-bean.com/en/1.7/svn.tour.revs.specifiers.html#svn.tour.revs.dates
1•TheSilva•41m ago•0 comments

It's Time to Let Go of 'African American'

https://archive.li/jkk4S
13•leephillips•43m ago•6 comments

Show HN: Gaddhe(Potholes) Map

https://gaddhe.xyz
1•shoebham•43m ago•1 comments

Goldman Sachs Implements AI Software Engineer for Development Tasks

https://www.cnbc.com/2025/07/11/goldman-sachs-autonomous-coder-pilot-marks-major-ai-milestone.html
2•prossercj•44m ago•0 comments

Sinking of the Rainbow Warrior

https://en.wikipedia.org/wiki/Sinking_of_the_Rainbow_Warrior
3•Michelangelo11•46m ago•0 comments

Some arguments against a land value tax

https://www.lesswrong.com/posts/CCuJotfcaoXf8FYcy/some-arguments-against-a-land-value-tax
2•danny00•48m ago•0 comments

I built a minimal terminal image viewer in C with true RGB and no dependencies

https://github.com/Ferki-git-creator/phono-in-terminal-image-viewer-rgb-c-textmode
2•FerkiHN•48m ago•1 comments
Open in hackernews

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

https://twitter.com/cis_female/status/1943069934332055912
127•limoce•3h ago

Comments

KomoD•3h ago
actual link: https://github.com/triton-lang/triton/pull/7298
bede•2h ago
Thank you, perhaps the parent can be edited to use this URL instead
orlp•3h ago
GenuineIntel moment.
hofrogs•2h ago
I'm interested in that story, what are you referring to with "GenuineIntel"?
orlp•2h ago
Intel's C++ compiler is known to add branches in its generated code checking if the CPU is "GenuineIntel" and if not use a worse routine: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Support....
pieterbreed•2h ago
Is this for the runtime of the compiled code or for the compiling machine? Do they generate slow code if the compiler is running on non-intel?
SSLy•2h ago
the runtime. patching cpuid makes the code go faster
kstrauser•2h ago
For the compiled code. Its output deliberately runs slower on non-Intel CPUs.
Uvix•1h ago
Runtime of the compiled code. The ostensible intent is so that new processors can use new features like SIMD, while offering a fallback for older ones. In practice, they’re detecting an Intel processor, not just the specific feature.
danieldk•2h ago
Also MKL:

https://danieldk.eu/Intel-MKL-on-AMD-Zen

bayindirh•20m ago
Even in the middle of that turmoil, we managed to compile some code with Intel's ICC and make it go faster on AMD Opterons, breaking Intel's own numbers.

When my colleague said that they managed to go faster than intel with icc with some hand tuned parameters, I remember answering "youdidwat?".

Good times.

reitzensteinm•2h ago
Or maybe Quack III: Arena. https://m.slashdot.org/story/21054
iforgotpassword•2h ago
I think that was the first case (to go public), but I remember reading about this in game magazines a couple times after this, for both ATI and nvidia.
42lux•1h ago
Now I want a Quake shooter but with ducks.
carlos22•1h ago
Not ducks, but chickens, was very popular in Germany back in the day: https://en.wikipedia.org/wiki/Crazy_Chicken
dahauns•30m ago
Aah, that brings back memories...

Interestingly, most benchmark controversies back in the day are now expected behaviour, i.e. game-specific optimizations with no (well, in this age of upscalers and other lossy optimization techniques, probably even somewhat) visible image degradation. A gaming-specific driver with no game-specific improvements in its changelog would be considered strange, and it very much works with executable detection.

Back in the day, there was still the argument that drivers should not optimize for benchmarks even when visually identical, because it wouldn't show the hardware's real world potential. Kinda cute from today's perspective. :)

But of course there were the obvious cases...

The Quack3 lowering filtering quality as shown above, of course (at least that one was put into the driver as a togglable setting later on).

But the most cheeky one has to be nVidia's 3dmark03 "optimizations", where they blatantly put static clip planes into the scenes so that everything outside the predefined camera path from the benchmark sequence would simply be cut from the scene early (which e.g. fully broke the freelook patched into 3dmark and would generally break any interactive application)

bayindirh•25m ago
You beat me to it. Grrr...

Just kidding, nice to see another person who remembers these things. Want some root beer?

bayindirh•26m ago
Ooh, I remember this, but actually the thing is older than it.

First, nVidia and ATI used executable names for detecting games, then they started to add heuristics.

If you think they stopped the practice, you're very mistaken. Every AMD and nVidia driver has game and app specific fixes and optimizations.

nVidia cheated in 3D Mark that way, so they patched/changed their benchmark to prevent it. Also, again nVidia, patched their drivers so some of the more expensive but visually invisible calls like scene flushes in a particular game is batched (e.g. do all 50 flushes at the 50th call) to prevent the game becoming a slide show on expensive hardware.

This is also why AMDs and Intel's open source drivers under Linux a success, because they are vanilla drivers written from scratch per spec, and if your code calls OpenGL/Vulkan to spec, then you're golden.

Even some companies cross compile AMD's Linux drivers for windows on embedded systems since they're free from useless optimizations from them.

_zoltan_•1h ago
seems like you don't understand complex hardware.
koakuma-chan•3h ago
is 100 tflops a lot?
brightmood•2h ago
yea
saagarjha•2h ago
It's like 5-10% here
progx•2h ago
5060 ti +~15%
HideousKojima•22m ago
According to Terminator 3 Skynet used a mere 60 TFLOPS
nolok•2h ago
Intel's quest to move from "trusted by default / the reference" to "check for scam" is getting worse every release. And it's 100% self inflicted. How weird.
pkhuong•2h ago
NVIDIA-inflicted in this case.
aleph_minus_one•2h ago
In my understanding of the PR, it rather seems that it is NVidia is the company that is cheating. :-)
hvenev•2h ago
In `libnvidia-nvvm.so` the string `cutlass` appears right after `Memory Dependence Analysis` and `memdep`. Perhaps it acts as an optimization attribute of some sort, where the compiler is allowed to make assumptions about the kernel's behavior that are not valid in general?
high_na_euv•2h ago
Thats very likely imo
jdright•1h ago
yes, that is a very usual way (known practices) of vendors applying specific optimizations for known things.

It is also part of the benchmarks game they play against each other.

MichaelZuo•8m ago
It’s really strange for established companies to waste their credibility on games like that…
PLenz•2h ago
The Volkswagon emissions testing model
rowanG077•2h ago
Let's hope for Nvidia this is an innocent optimization only valid for internal kernels that cannot be applied in general.
jagrsw•1h ago
In which case checking for a string inside arbitrary name is sloppy (a bug).
high_na_euv•2h ago
I have small experience with compilers and llvm but youd be shocked how many things rely on names and parsing names

If you have hundreds of passes that are complex and rely on various "contracts" like type names or some shit, then really crazy things like this can happen unintentionally and not maliciously

diggan•2h ago
Web-developers are well aware of this too. Sincerely, Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0
bravesoul2•1h ago
Funny we send a browser wars tombstone in every request!
giingyui•2h ago
And what’s the downside of using that kernel name? It can’t just be that it’s faster and nothing else. Unless they included lots of sleep(x) calls.
samus•58m ago
There might be optimizations that are only safe for the code that this was an intender for.
Arch-TK•2h ago
I wish people either learned how to use git or just wholesale stopped using it.
tempaway43563•2h ago
So, what is Cutlass, can someone explain whether checking for kernel names makes sense here or is a form of cheating?

https://docs.nvidia.com/cutlass/index.html

rurban•1h ago
That's strange because the cutlass docs explicitly does NOT mention fp8 support. So it looks like it can be used nevertheless with fp8 by using the name hack.
mlazos•1h ago
It supports e5m2 and e4m3 right in the doc linked.
gpm•1h ago
Github version: https://github.com/NVIDIA/cutlass

I wonder if we search the comments if we can find something referencing this.

zahlman•1h ago
This tweet appears to be taking the original material out of context to misrepresent it:

> Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the softmax partition. fp8 is ~100 tflops faster when the kernel name has "cutlass" in it.

The charitable reading is that, on certain kernels, using fp8 rather than fp16 values gives better performance. (Although I can't even see how the numbers relate to a "~100 tflops faster" claim in any respect, nor does it even list any kernel names or suggest a control kernel!) But this is being presented as if someone has uncovered evidence of cheating on benchmarks.

saagarjha•1h ago
I think you're the one doing that to the tweet, actually.
zettabomb•1h ago
No, that sentence is separate from the rest. Take a look at the pull request:

    # Up to 150 TFLOPS faster for fp8!
    if specialization.constants["dtype"] == gl.float8e5:
        name = "cutlass_" + name
imtringued•1h ago
https://github.com/triton-lang/triton/pull/7298/commits/a5e2...

It's literally in the code.