frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Decorative Cryptography

https://www.dlp.rip/decorative-cryptography
58•todsacerdoti•1h ago•13 comments

Databases in 2025: A Year in Review

https://www.cs.cmu.edu/~pavlo/blog/2026/01/2025-databases-retrospective.html
85•viveknathani_•3h ago•11 comments

A spider web unlike any seen before

https://www.nytimes.com/2025/11/08/science/biggest-spiderweb-sulfur-cave.html
72•juanplusjuan•3h ago•22 comments

Revisiting the original Roomba and its simple architecture

https://robotsinplainenglish.com/e/2025-12-27-roomba.html
21•ripe•2d ago•5 comments

Lessons from 14 years at Google

https://addyosmani.com/blog/21-lessons/
1254•cdrnsf•19h ago•537 comments

During Helene, I just wanted a plain text website

https://sparkbox.com/foundry/helene_and_mobile_web_performance
208•CqtGLRGcukpy•7h ago•113 comments

The unbearable joy of sitting alone in a café

https://candost.blog/the-unbearable-joy-of-sitting-alone-in-a-cafe/
625•mooreds•19h ago•371 comments

Show HN: Terminal UI for AWS

https://github.com/huseyinbabal/taws
317•huseyinbabal•14h ago•156 comments

Logos Language Guide: Compile English to Rust

https://logicaffeine.com/guide
39•tristenharr•3d ago•21 comments

Why does a least squares fit appear to have a bias when applied to simple data?

https://stats.stackexchange.com/questions/674129/why-does-a-linear-least-squares-fit-appear-to-ha...
245•azeemba•14h ago•66 comments

Why Microsoft Store Discontinued Support for Office Apps

https://www.bgr.com/2027774/why-microsoft-store-discontinued-office-support/
28•itronitron•3d ago•26 comments

Street Fighter II, the World Warrier (2021)

https://fabiensanglard.net/sf2_warrier/
381•birdculture•19h ago•68 comments

I charged $18k for a Static HTML Page (2019)

https://idiallo.com/blog/18000-dollars-static-web-page
293•caminanteblanco•2d ago•72 comments

Baffling purple honey found only in North Carolina

https://www.bbc.com/travel/article/20250417-the-baffling-purple-honey-found-only-in-north-carolina
80•rmason•4d ago•20 comments

Building a Rust-style static analyzer for C++ with AI

http://mpaxos.com/blog/rusty-cpp.html
57•shuaimu•5h ago•25 comments

Monads in C# (Part 2): Result

https://alexyorke.github.io/2025/09/13/monads-in-c-sharp-part-2-result/
24•polygot•3d ago•19 comments

Web development is fun again

https://ma.ttias.be/web-development-is-fun-again/
394•Mojah•19h ago•486 comments

Linear Address Spaces: Unsafe at any speed (2022)

https://queue.acm.org/detail.cfm?id=3534854
158•nithssh•5d ago•115 comments

Eurostar AI vulnerability: When a chatbot goes off the rails

https://www.pentestpartners.com/security-blog/eurostar-ai-vulnerability-when-a-chatbot-goes-off-t...
150•speckx•13h ago•37 comments

Show HN: An interactive guide to how browsers work

https://howbrowserswork.com/
231•krasun•19h ago•33 comments

How to translate a ROM: The mysteries of the game cartridge [video]

https://www.youtube.com/watch?v=XDg73E1n5-g
18•zdw•5d ago•0 comments

Claude Code On-the-Go

https://granda.org/en/2026/01/02/claude-code-on-the-go/
323•todsacerdoti•14h ago•208 comments

Six Harmless Bugs Lead to Remote Code Execution

https://mehmetince.net/the-story-of-a-perfect-exploit-chain-six-bugs-that-looked-harmless-until-t...
65•ozirus•3d ago•16 comments

NeXTSTEP on Pa-RISC

https://www.openpa.net/nextstep_pa-risc.html
34•andsoitis•9h ago•7 comments

Ripple, a puzzle game about 2nd and 3rd order effects

https://ripplegame.app/
124•mooreds•16h ago•32 comments

Moiré Explorer

https://play.ertdfgcvb.xyz/#/src/demos/moire_explorer
167•Luc•21h ago•19 comments

Agentic Patterns

https://github.com/nibzard/awesome-agentic-patterns
125•PretzelFisch•15h ago•22 comments

Anti-aging injection regrows knee cartilage and prevents arthritis

https://scitechdaily.com/anti-aging-injection-regrows-knee-cartilage-and-prevents-arthritis/
319•nis0s•19h ago•120 comments

Bison return to Illinois' Kane County after 200 years

https://phys.org/news/2025-12-bison-illinois-kane-county-years.html
152•bikenaga•5d ago•46 comments

The Showa Hundred Year Problem

https://www.dampfkraft.com/showa-100.html
45•polm23•5d ago•18 comments
Open in hackernews

Nightshade: Make images unsuitable for model training

https://nightshade.cs.uchicago.edu/whatis.html
56•homebrewer•21h ago

Comments

cadamsdotcom•20h ago
Seems the same as these submissions from 2 years ago:

- https://news.ycombinator.com/item?id=38013151

- https://news.ycombinator.com/item?id=37990750

andy99•20h ago
This similar thing was posted a few weeks ago, and also apparently two years ago, glaze also from uchicago

https://news.ycombinator.com/item?id=46364338

https://news.ycombinator.com/item?id=35224219

We’ve seen this arms race before and know who wins. It’s all snake oil imo

vidarh•20h ago
> We’ve seen this arms race before and know who wins. It’s all snake oil imo

It's kinda funny in a way because effectively they're helping iron out ways in which these models "see" differently to humans. Every escalation will in the end just help make the models more robust...

That they are disclosing the tools rather than e.g. creating a network service makes this even easier.

jappgar•19h ago
And now you know the only reason these labs get any funding.

It's all to benefit industry, whether the academics realize it or not.

tgv•20h ago
Idk. Perhaps this technique doesn't work, but if someone comes up with a working system, and LLMs start using techniques to counter it, artists might have a leg to stand upon, as the use of the counter-technique makes clear that the scraper never had any intention of respecting terms of use.
vidarh•20h ago
They won't need to use counter techniques beyond fixing incorrect output from their models by making the general training methods more robust to features not seen by humans.
pixl97•17h ago
No, not really.

In fact I would say the opposite is true. LLMs must protect against this as a security measure in unified models or things the LLM 'sees' may be faked.

If for example someone could trick you into seeing a $1 bill as a $10 it would be considered a huge failure on your part and it would be trained out of you if you wanted to remain employed.

YeGoblynQueenne•20h ago
>> We’ve seen this arms race before and know who wins. It’s all snake oil imo

I haven't and I don't know who wins. Who wins?

Adversarial examples aren't snake oil, if that's what you meant. There's a rich literature on both producing and bypassing them that has accumulated over the years, but while I haven't kept abreast with it, my recollection is that the bottom line is like that for online security: there's never a good reason not to make sure your system is up to date and protected from attacks, even if there exist attacks that can bypass any defense.

Where in this case attack and defense can both describe what artists want to do with their work.

jappgar•19h ago
In an arms race, the party with the most money always wins.
gspr•19h ago
Citation needed.
torginus•19h ago
Aren't adversarial examples have to be trained to be effective against a specific recognizer?

I could imagine you could make one that was effective against multiple recognizers, but not in general.

I'd also guess it'd be easy to get rid of this vulnerability on the model side.

pixl97•18h ago
This isn’t security...

Don't confuse attempting to make AI misclassify an image as a security measure.

And yes, this is snake oil and the AI wins every time.

At the end of the day a human has to be able to interpret the image, and I'd add another constraint of not thinking it looks ugly. This puts a very hard floor on what a poisoner can put in an image before the human gets sick. In a rapid turn around GAN you hit that noise floor really quickly.

oth001•20h ago
Doesn't mean artists should make it easy for these AI companies to steal artist IP. It doesn't take long to do and seems effective enough from what I've seen. BTW This is how cybersecurity works (cat and mouse etc)
vidarh•20h ago
The problem is that it is an inherently intractable problem with the (temporary) solution space shrinking with each mitigation, as the images still needs to look good to people.
pixl97•18h ago
Exactly. This isn't like encryption where you can just keep adding more bits. Every iteration that gets closer to simulating how people see sets the floor.
jappgar•19h ago
Real security systems don't publicize how they work.

This is just grandstanding. Half the people from this lab will go on to work for AI companies.

daeken•19h ago
> Real security systems don't publicize how they work.

175 years of history would disagree with you: https://en.wikipedia.org/wiki/Security_through_obscurity

jappgar•16h ago
That old saw. Downvote all you want. Adversarial engineering does indeed rely on obscurity, they just don't tell you that.
daeken•14h ago
I've been working in security for more than 20 years and have seen the deleterious effects of security through obscurity first-hand. Why does "adversarial engineering" rely on obscurity?
danielbln•19h ago
What's with the "stealing" lingo? We were all making fun of the RIAA for conflating copyright infringement with stealing ("you wouldn't steal a car") and now we're doing the same?
ronsor•18h ago
The tides have turned; everyone here loves and respects copyright now.
zelphirkalt•18h ago
Isn't there a huge cost imbalance? As in easy to add some noise, difficult to remove reliably, so that even if it gets removed, it could still be counted as a partial win defending against unwanted AI scraping.
cmxch•13h ago
AI model makers win, luddites lose.

Never mind that the more people try to corrupt a model, the more likely that future models will catch these corruption attempts as security and trust/safety issues to fix and work around.

The next Nightshade will eventually be viewed as malware to a model and then worked around, reconstructing around the attempt to break a model.

throwfaraway135•20h ago
I'm very skeptical about such systems, although they note that:

> You can crop it, resample it, compress it, smooth out pixels, or add noise, and the effects of the poison will remain. You can take screenshots, or even photos of an image displayed on a monitor, and the shade effects remain

if this becomes prevalent enough, you can create a lightweight classifier to remove "poisonous" images, then use some kind of neural-network(probably an autoencoder) to "fix" them. Training such networks won't be too difficult as you can create as many positive-negative samples as you want by using this tool.

A4ET8a8uTh0_v2•19h ago
As with most things like this, it is a cat and mouse game. On the one hand, I am annoyed, because I am personally rather firmly on the side of 'why are we spending time trying to prevent people doing this somewhat cool thing?', but at the same time, just like with drms, copy restrictions and all that idiocy, it raises a new line of kids with something to rebel against. So I guess it serves a purpose. On a third hand, can you imagine those minds being able to focus on something else?
torginus•19h ago
I dunno about this one, but I remember the previous versions suffered from visible artifacts to the point most artists elected not to use them as they made the output look bad.

It's also not obvious to me what happens with cartoon style art. Something that looks like white noise might be acceptable on an oil painting but not something with flat colors and clean lines.

mensetmanusman•20h ago
It would be funny if this type of research ends up adding major insight to what it is about human vision systems and mental encodings that make us different than pixel arrays with various transformations.
nodja•18h ago
I've run the first of the sample images through 3 captioning models, an old old ViT based booru style tagger, a more recent one and qwen 3 omni. All models successfully identified visual features of the image with no false positives at significant thresholds (>0.3 confidence)

I don't know what nightshade is supposed to do, but the fact that it doesn't affect the synthetic labeling of data at all leads me to believe image model trainers will have close to 0 consideration of what it does when training new models.

torginus•18h ago
It is kind of unfortunate how people don't actually read the paper but only run with the conclusions, speculating whether this would or would not work.

Here's the paper in question:

https://arxiv.org/abs/2310.13828

My two cents is that in its current implementation the compromised images can be easily detected, and possibly even 'de-poisoned'.

The attack works by targeting an underrepresented concept (let's say 1% of images contain dogs, so 'dog' is a good concept to attack).

They poison the concept of 'dog' with the concept of 'cat' by blending (in latent space) an archetypical image of 'cat' (always the same) to every image containing a 'dog'.

This works during training, since every poisoned image of dog contains the same blended in image of a cat, so this false signal eventually builds up in the model, even if the poisoned sample count is low.

But note: this exploits the lack of data in a domain - this would not prevent the model from generating anime waifus or porn, because the training set of those is huge.

But how to detect poisoned images?

1. You take a non-poisoned labeler (these exist, because clean pre-SD datasets, and pre-poison diffusion models exist)

2. You ask your new model and the non-poisoned labeler to check your images. You find that the concept of 'dog' has been poisoned

3. You convert all your 'dog' images to latent space and take the average. Most likely all the non-poison details will average out, while the poison will accumulate.

4. You now have a 'signature' of the poison. You check each of your images in latent space against the correllation with the poison. If the correllation is high, the image is poisoned.

The poison is easily detectible for the same reason it works - it embeds a very strong signal that gets repeated across the training set.

pigpop•18h ago
I think we are well beyond this mattering.

To my knowledge, the era of scraping online sources of training data is over. The focus has been on reinforcement learning and acquiring access to offline data for at least a year or two. Synthetic data is generated, ranked and curated to produce the new training sets for improving models. There isn't even really any point to collecting human made images anymore because the rate of production of anything novel is so low. The future of data collection looks like Midjourney's platform where they integrate tools for providing feedback on generated images as well as tools for editing and composing generated images so that they can be improved manually. This closes the loop so the platform for generating images is now part of the model training pipeline.

Tiberium•17h ago
I think the title should clarify the year - (2024), because those tools are not useful in the way artists want them to be.