frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
1•senekor•27s ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•3m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
2•myk-e•5m ago•2 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•6m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
1•1vuio0pswjnm7•8m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
1•1vuio0pswjnm7•10m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•12m ago•1 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•14m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•19m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•21m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•24m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•36m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•38m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•39m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•52m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•55m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•57m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•1h ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
2•basilikum•1h ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•1h ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•1h ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
4•throwaw12•1h ago•2 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•1h ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•1h ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•1h ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•1h ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•1h ago•1 comments
Open in hackernews

I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch

https://github.com/yousef-rafat/miniDiffusion
481•yousef_g•7mo ago

Comments

squircle•7mo ago
Although I'm leaning heavily away from being passionate about software development, this is a cool project, and its freaken awesome how anyone can now reinvent the wheel from first principles.
albert_e•7mo ago
Sounds like a great resources for learners.

Just wondering aloud --

Is there a tutorial/explainer by any chance that a beginner could use to follow along and learn how this is done.

an0malous•7mo ago
fast.ai has a course on building Stable Diffusion: https://course.fast.ai/Lessons/part2.html
BinaryMachine•7mo ago
Great resource Jeremy Howard is awesome, I have been waiting to take this course and follow along because anything older than a year in Deep Learning is already outdated. I hope they release a new version.
whiplash451•7mo ago
I don’t think this is true. The fast.ai class covers a lot of fundamentals that are still valid and useful today.
socalgal2•7mo ago
> If you ... are comfortable with building an SGD training loop from scratch in Python, being competitive in Kaggle competitions, using modern NLP and computer vision algorithms for practical problems, and working with PyTorch and fastai, then you will be ready to start the course.

...sigh...

Yes, they tell you do their first course but I have no confidence that one course in this stuff will make me "comfortable with building an SGD training loop from scratch in Python, being competitive in Kaggle competitions, using modern NLP and computer vision algorithms for practical problems, and working with PyTorch and fastai"

reedlaw•7mo ago
I'm not sure what this means. If it means the Stable Diffusion 3.5 model, why is it fetching that here: https://github.com/yousef-rafat/miniDiffusion/blob/main/enco...

The training dataset is very small, only including fashion-related pictures: https://github.com/yousef-rafat/miniDiffusion/tree/main/data...

yousef_g•7mo ago
The dataset is for trying out fine-tuning of the diffusion model. It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.
reedlaw•7mo ago
So this implements SD3 inference and fine-tuning?
jatins•7mo ago
> It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.

Could you clarify what you mean by this part -- if the weights are taken from HF then what's the implementation for?

MoonGhost•7mo ago
My guess the weights from HF are used as initial state for the model because full training is too expensive. Then small dataset is used to train in further for short time. Which is fine tuning. Together it shows that model is 1) compatible 2) trainable. In theory it can be trained from scratch on big dataset. I didn't look in the code yet so the questions are: 1) can it be trained in parallel? 2) resources required for training?

Anyway, I may try to train it on limited specialized dataset...

montebicyclelo•7mo ago
> if the weights are taken from HF then what's the implementation for

The weights are essentially a bunch of floating point numbers, (grouped into tensors). The code says what operations to do with the weights. E.g. say you load matrix W from the weights, you could do `y = W @ x`, or `y = W.T @ x`, or `y = W @ W @ x` etc.

elbear•7mo ago
The model consists of its architecture which is expressed as code, and its knowledge, which is gained through training.
CamperBob2•7mo ago
Add a Hugging Face Token in get_checkpoints.py before running the script.

Can you be a bit more specific here? It's not clear what such a token is, what it takes to get one, or where it would be placed in get_checkpoints.py.

einsteinx2•7mo ago
> what such a token is

An API token from Hugging Face

> what it takes to get one

You generate them in your Hugging Face account

> where it would be placed in get_checkpoints.py.

Line 59 in the empty quotes where it says token = “”

CamperBob2•7mo ago
Ah, I see it now, thanks.

That's the kind of thing that, stylistically speaking, it's good to define at the very top of the module.

einsteinx2•7mo ago
Agreed. I’m not part of the project I just saw your comment and figured I’d try and help.
Dwedit•7mo ago
Leaving off the "API" part from "API Token" causes confusion, since AI models tokenize all text into "tokens" before running the model. It's using the same word to describe two very different things.
einsteinx2•7mo ago
Yep totally. Fwiw I’m not part of the project I just saw the comment and figured I’d try and help.
theturtle•7mo ago
Cool. Can it still make images of Anne Hathaway leading a herd of blue giraffes on the Moon?
IncreasePosts•7mo ago
Seems difficult, as there are no known portraits of Anne Hathaway
liuliu•7mo ago
If you are interested in this: Flux reference implementation is very minimalistic: https://github.com/black-forest-labs/flux/tree/main/src/flux

The minRF project is very easy to start with training small diffusion models with rectified flow: https://github.com/cloneofsimo/minRF

Also, the reference implementation of SD 3.5 is actually minimalistic too: https://github.com/Stability-AI/sd3-ref

doctorpangloss•7mo ago
Reference implementations are unmaintained and buggy.

For example https://github.com/huggingface/transformers/issues/27961 OpenAI's tokenizer for CLIP is buggy, it's a reference implementation, it isn't the one they used for training, and the problems with it go unsolved and get copied endlessly by other projects.

What about Flux? They don't say it was used for training, it wasn't, there are bugs with it that break cudagraphs or similar that aren't that impactful. On the other hand, it uses CLIP reference, and CLIP reference is buggy, so this is buggy...

BoredPositron•7mo ago
You can disable clip l on flux without a loss in quality. You are also making an elephant out of a fly. CLIP is used everywhere.
doctorpangloss•7mo ago
Consider another interpretation: CLIP L in Flux can be disabled without a loss in quality because the way it is used is buggy!
BoredPositron•7mo ago
oh lord.
doctorpangloss•7mo ago
The truth is that the CLIP conditioning in Flux works well for Dreambooth style fine tuning where tokenization bugs can be acute, but not so severe as to cause the low impact of CLIP on their dev model. It is likely more impactful on their pro / max models but only BFL could say so.
BoredPositron•7mo ago
That's absolute nonsense.
doctorpangloss•7mo ago
okay well, there are a few things that are known to be true: (1) clip's tokenizer in diffusers, the reference source in BFL's repo, and in openai's repo, is buggy (2) many clip prompts are observed to have a low impact in the flux dev and schnell models. it is very likely to be true that (1) the tokenizer in the BFL reference source and openai's repo does not match the tokenizer used in training openai's clip or the text conditioning for any of the flux checkpoints (2) the guidance and timestep distillation play a role in weakening the role of clip (3) it is practical to fine tune clip on more image-caption pairs. if you care about fine tuning, the tokenization bugs matter. everything else is hard to prove.
electroglyph•7mo ago
It shouldn't take a lot of effort to fix a tokenizer...
doctorpangloss•7mo ago
People are a little too blinded by the insight porn of matching buggy behavior to just read and comprehend the issue. They can’t engage with the simpler and more pornographic insight porn that the reference implementations are buggy and do not match the trained artifacts.
liuliu•7mo ago
Congrats on finding a bug!

However, the keyword here is training / inference divergence. Unfortunately, nobody is going to spend multi-million to retrain a model, so our reimplementation needs to be bug-to-bug correct to use the trained weights properly. That's why the reference implementations are essential because it is from the original model trainers so you have the best "bet" on matching the training code properly.

To give you some concrete example of bugs we needs to maintain:

1. In SDXL, they use OpenClipG for text encoding, but wrongfully uses 0 as padding tokens (corresponding to symbol "!") whereas even for OpenClipG its own training, the endoftext token was used as padding token. However, if you switching SDXL to use endoftext token as padding token, due to training / inference divergence, you get subpar generated images.

2. In FLUX, we mainly use T5 as text encoder. However, T5 usually used as encoder with mask to exactly the same input length, to avoid extended impact of padding tokens. In FLUX, we don't apply mask for T5 text encoding, hence intuitively causing padding token to take more effect than it should. Again, "fixing" this bug without retraining you will get subpar generated images.

There are many examples like this, some are easier to fix some are not (HiDream uses a different ODE solver that is different than what we usually do for rectified flow, hence you need to negate its prediction to be compatible with existing samplers, but this is "easier to fix").

TL;DR: Yes, there are bugs in software, but we better to maintain bug-to-bug compatibility than trying to "fix" it, hence highlight the importance of a "done" reference implementation, rather than a usual "active" implementations in software industry otherwise.

(I maintain the most complete reimplementation of SoTA media generation models in Swift: https://github.com/drawthingsai/draw-things-community/tree/m.... So I tend to think that I know one or two about "reimplementation from scratch".)

doctorpangloss•7mo ago
I think if you read the issue carefully you would understand that the CLIP implementation in transformers and as published by OpenAI is wrong and does not match their trained model code; and that doing the fix I suggest, empirically for me and in theory, improves results.
vergessenmir•7mo ago
Is there any notable properties of this implementation, are some parts slower, faster etc
NoelJacob•7mo ago
So, that's Stable Diffusion without license constraints, is it?
Sharlin•7mo ago
No, the inference/training algorithms, being math, are not copyrightable. OP just wrote another implementation. What's copyrighted are the models, which OP did not train from scratch (having neither the training material nor the compute to do that).
echelon•7mo ago
We should be specific when we say "models".

The code outlining the network vs. the resultant weights. (Also vs. any training, inference, fine tuning, misc support code, etc.)

The theoretical diagram of how the code networks and modules are connected is math. But an implementation of that in code is copyrightable.

Afaik, the weights are still a grey area. Whereas code is code and is copyrightable.

Weights are not produced by humans. They are the result of an automated process and are not afforded copyright protection. But this hasn't been tested in court.

If OpenAI GPT 4o weights leak, I think the whole world could use it for free. You'd just have to write the code to run them yourself.

bravesoul2•7mo ago
I use model architecture for the code/math and weights for the weights to avoid confusion!

Then there are hyperparameters which are also needed to be known to use the weights with the model architecture.

MoonGhost•7mo ago
> I use model architecture for the code/math

Code is copyrightable and math is not. What about 'architecture'?

vrighter•7mo ago
which means he is still in full violation of their license
Zambyte•7mo ago
> What's copyrighted are the models

Has this actually been tested yet? Or are we still at the stage of AI companies trying to pretend this into reality?

dheera•7mo ago
I mean, if you take a match to a blank CD-ROM, or shoot neutrinos at a USB drive, there is a very small chance that you get the SD weights stored on them
Zambyte•7mo ago
You can say that about literally any digital information. This isn't really interesting in the context of the copyright status of AI models.
bravesoul2•7mo ago
If the models are copyright protected then presumably they obeyed license on the upstream dependencies they included (i.e. the training data).
bredren•7mo ago
Is upstream dependency licensure necessary to establish copyright? For example, I Need a Haircut was still a unique work regardless of the rights to sample Alone Again.
bravesoul2•7mo ago
Oh :( wasn't what I thought it would be. Wondered why it wasn't more blown up on HN!
caycep•7mo ago
How usable is the original academic source available from Ludwig Maximilian University CompViz group?
eapriv•7mo ago
I find it hilarious that “from scratch” now somehow means “in PyTorch”.
monsieurbanana•7mo ago
If any "from scratch" post doesn't start with linking to a Primitive Technology video, I'm closing the tab
mkoubaa•7mo ago
Unless the author was raised by chimps I'm out
0cf8612b2e1e•7mo ago
Not fusing heavier elements from hydrogen? I’m out.
chairmansteve•7mo ago
Yeah. Should have done it in assembly.
mardifoufs•7mo ago
Pytorch is a pretty basic building block when you get to some degree of model complexity. It wouldn't really be interesting to implement autograd or some other things pytorch provides imo when the goal is to show a reimplantation of something as "higher" level as SD. It's similar to how I don't mind it when someone doesn't reimplement an OS, or a JavaScript engine when writing a web app from scratch.

And there's been a recent surge in abstractions over pytorch, and even standalone packages for models that you are just expected to import and use as an API (which are very useful, don't get me wrong!). So it's nice to see an implementation that doesn't have 10 different dependencies that each abstract over something pytorch does.

eapriv•7mo ago
> It wouldn't really be interesting

Andrej Karpathy did exactly that, and I think it’s quite interesting.

yorpinn•7mo ago
I agree, great series of videos, but there's a dependent clause:

> ...when the goal is to show a reimplantation of something as "higher" level as SD.

Implementing autograd is interesting, but it's not directly in service to our main subject (Stable Diffusion) and would be a major yak shave. Comparable in complexity to the original project.

refulgentis•7mo ago
I'm embarrassed to ask: can someone elaborate on, say, what we have now that we didn't have before the repo existed?

I have studiously avoided making models, though I've been adjacent to their output for years now... I think the root of my confusion is I kinda assumed there was already PyTorch based scripts for inference / training. (I assumed _at least_ inference scripts were released with models, and kinda figured fine-tuning / training ones were too)

So then I'm not sure if I'm just looking at a clean room / dirty room rewrite of those. Or maybe everyone is using "PyTorch" but it's usually calling into CUDA/C/some proprietary thingy that is much harder to grok than a pure PyTorch impl?

Anyways, these arent great guesses, so I'll stop myself here. :)

_tqr3•7mo ago
Stability AI, creators of Stable Diffusion models release their products under own Stability AI Community License which is not "free" like MIT license. You are not allowed to modify the weights in certain ways.

This package is basically running the model (inference) and maybe fine tuning it using existing AI weights. A great way to learn but still could run into same licensing issue.

refulgentis•7mo ago
You can't finetune SD 3.5!?

I thought the community license stuff was about keeping people from using it in prod and charging for it without Stability getting at least a small taste.

This sucks.

I haven't been keeping up with gooner squad on Civit, but I did have some understanding SD was less popular, but I thought it was just because 3.5 came far too long after Flux with too little, if any, quality increase to be worth building new scaffolding for.

fc417fc802•7mo ago
> You can't finetune SD 3.5!?

They don't want you finetuning it in specific ways that might make them look bad by association.

djhn•7mo ago
So, out of interest, what are good TLDR sources for following the gooner scene? Like some highlights newsletter, subreddit, podcast, youtube channel or something? I’m interested in keeping up with their methods, not their results and output.
refulgentis•7mo ago
Apologies for late reply: sdnsfw and civit.ai. Thing with it is, it takes a lot of, uh, effort because you need to find the 1 in 100 that did something new and shared it. And sadly while, it did confer a leading edge in applied research, maybe as much as 6 months, (ex. Loras, model merging), it's just not as much as it used to be and the quality difference is also less than it used to be
rockemsockem•7mo ago
I believe this is the main piece

> with minimal dependencies

I haven't tried running SD 3.5 specifically, but it's built on hugging face libraries which I personally always find to be a mess of dependencies that make it really hard to setup without the exact configuration the original developers used (which is often not provided in enough detail to actually work). This makes it pretty hard to run certain models especially if it's a few months/years after the original release.

For example this appears to be the requirements for the stability AI reference implementation for SD3.5 and there are no versions specified and it includes "transformers" which is just an enormous library.

https://github.com/Stability-AI/sd3.5/blob/main/requirements...

refulgentis•7mo ago
Ah, tyvm, that maps well onto my knowledge set, I have a ONNX inference wrapper written in Dart. However, I have never been able to leverage transformers.js ONNX demo code, i.e. have a reference to port to Dart.

IIRC it is written in an abstraction layer that supports a transformers-like API surface. This also makes it opaque to figure out what you're actually passing to the model, adding a Python dep mess on top of that...woo boy.

hkon•7mo ago
now do it in minecraft
ineedasername•7mo ago
When I think of SD 3.5 (or any version) I think of the portion that results from training, i.e., the weights. The code seems less important? I mean as far as output quality is concerned, or performance. But I'm honestly not sure, and not trying to judge these efforts on that basis.
Dwedit•7mo ago
Does using pure PyTorch improve performance on non-NVIDIA cards in any way? Or is PyTorch so highly optimized for CUDA that no other GPU vendors have a chance?
VeejayRampay•7mo ago
I believe pytorch works nicely with rocm, but I don't know if it's nicely to the point where it's "on par"
3abiton•7mo ago
It seems to be the case, although pytorch rocm is coming around slowly. Very slowly, if you get it working that is.
chickenzzzzu•7mo ago
It is possible to run ML workloads on for example AMD devices via Vulkan. With newer extensions like cooperative matrix, and maybe also in the future some scheduling magic exposed by the driver through a new extension, the remaining single digit percent gap CUDA has will evaporate.
jwitthuhn•7mo ago
Pytorch also runs great on apple silicon, though it is hard to directly compare because Apple's high end GPUs can't compute anywhere near as much as nvidia's high end stuff.

e: I'll also add that pytorch does still have one oddity on apple silicon which is that it considers each tensor to be 'owned' by a particular device, either a cpu or gpu. Macs have unified memory but pytorch will still do a full copy when you 'move' data between the cpu and gpu because it just wasn't built for unified memory.

brcmthrowaway•7mo ago
Does pytorch work on AS out of the box? Or do you need some apple specific package
thom•7mo ago
`uv pip install torch` just works, set your default device to `mps:0`, enjoy the RAM. Depends what you're doing though - some stuff isn't implemented, so if you're trying to fit a Gamma/Beta/Student-T distribution you're out of luck.
godelski•7mo ago

       self.q = nn.Linear(embed_size, embed_size, bias = False)
       self.k = nn.Linear(embed_size, embed_size, bias = False)
       self.v = nn.Linear(embed_size, embed_size, bias = False)
Try

       self.qkv = nn.Linear(embed_size, 3*embed_size, bias = False)

    def forward(...):
       ...
       qkv = self.qkv(x)
jszymborski•7mo ago
This adds connections between the parameters of q, k, and v whereas the original doesn't, unless my very tired brain is missing something.
smus•7mo ago
Nope, they all depend on x and the same is true in this scenario
godelski•7mo ago
It is actually really common practice. It is a single linear layer so there's no connection intranodes. The reason to do this is because it is a bit less computationally intensive.

tldr: linear layers have an associative property

nothrowaways•7mo ago
Pure pytorch?