I noticed that the URL for this submission is wrong: I tried to submit the correct URL (https://www.krea.ai/blog/flux-krea-open-source-release) but, for some reason, the submission gets flagged as duplicated and then I can only find this item which has a URL to our old blog post.
In the mean time, I'll setup a server-side redirect from the old blog post to our new one, but it would be nice to fix the link and I don't think I can do it on my side.
<link rel="canonical" href="https://www.krea.ai/blog/new-krea">
Our software follows canonical links when it finds them.I've fixed the link above now (and rolled back the clock on the submission, to make up for lost time) but you might want to fix this for future pages.
Thank you so much! I knew that HN software was advanced, but I didn’t know you guys used Canonical URLs like Google does. Smart and thanks for helping us with this slip!!!
what does " designed to be compatible with FLUX architecture" mean and why is that important?
Regarding this part: > Since flux-dev-raw is a guidance distilled model, we devise a custom loss to finetune the model directly on a classifier-free guided distribution.
Could you go more into detail on the specific loss used for this and any other possible tips for finetuning this that you might have? I remember the general open source ai art community had a hard time with finetuning the original distilled flux-dev so I'm very curious about that.
we prepared a blogpost about how we trained FLUX Krea if you're interested in learning more: https://www.krea.ai/blog/flux-krea-open-source-release
.scrollbar-hide {
-ms-overflow-style: none;
scrollbar-width: none;
}
From a business point of view, there are many use-cases. Here's a list in no particular order:
- You can quickly generate assets that can be used _alongside_ more traditional tools such as Adobe Photoshop, After Effects, or Maya/Blender/3ds Max. I've seen people creating diffuse maps for 3D using a mix of diffusion models and manual tweaking with Photoshop.
- Because this model is compatible with the FLUX architecture, we've also seen people personalizing the model to keep products or characters consistent across shots. This is useful in e-commerce and fashion industry. We allow easy training in our website — we labeled it Krea 1 — to do this, but the idea with this release is to encourage people with local rigs and more powerful GPUs to be able to tweak with LoRAs themselves too.
- Then I've seen fascinating use-cases such as UI/UX designers who prompt the model to create icons, illustrations, and sometimes even whole layouts that then they use as a reference (like Pinterest) to refine their designs on Figma. This reminds me of people who have a raster image and then vectorize it manually using the pen tool in Adobe Illustrator.
We also have seen big companies using it for both internal presentations and external ads across marketing teams and big agencies like Publicis.
EDIT: Then there's a more speculative use-case that I have in mind: Generating realistic pictures of food.
While many restaurants have people who either make illustrations of their menu items and others have photographers, the big tail of restaurants do not have the means/expertise to do this. The idea we have from the company perspective is to make it as easy as snapping a few pictures of all your dishes and being able to turn all your menu (in this case) into a set of professional-looking pictures that accurately represent your menu.
Does this have any application for generating realistic scenes for robotics training?
One interesting use case would be if you are focusing on a robotics task that would require perception of realistic scenes.
From the article it doesn’t seem as though photorealism per se was a goal in training; was that just emergent from human preferences, or did it take some specific dataset construction mojo?
- GitHub repository: https://github.com/krea-ai/flux-krea
- Model Technical Report: https://www.krea.ai/blog/flux-krea-open-source-release
- Huggingface model card: https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev
Not the real reason. The real reason is that training has moved to FP/BF16 over the years as NVIDIA made that more efficient in their hardware, the same reason you're starting to see some models being released in 8bit formats (deepseek).
Of course people can always quantize the weights to smaller sizes, but the master versions of the weights is usually 16bit.
Often, the training is done in FP16 then quantized down to FP8 or FP4 for distribution.
i asked chat for an explanation and it said bfloat has a higher range (like fp32) but less precision.
what does that mean for image generation and why was bfloat chosen over fp?
Check this out: https://github.com/krea-ai/flux-krea
Let me see if we can add more details on the blog post and thanks for the flag!
- cost per image - latency per image
Hope you guys can add it somewhere!
Though we wanted to keep this technical blogpost free from marketing fluff, but maybe we over-did it.
However, sometimes it's hard to give an exact price per image, as it depends on resolution, number of steps, whether a LoRA is being used or not, etc.
Another note about preference optimisation and RL is that it has really high quality ceiling but needs to be very carefully tuned. It's easy to get perfect anatomy and structure if you decide to completely "collapse" the model. For instance, ChatGPT images are collapsed to have slight yellow color palette. FLUX images always have this glossy, plastic texture with overly blurry background. It's similar to reward hacking behavior you see in LLMs where they sound overly nice and chatty.
I had to make a few compromises to balance between "stable, collapsed, boring model" and "unstable, diverse, explorative" model.
Then optimise for max (Quality + A*R)
Arguably amplitude of A should do R but I think the AI-ness and the AI-ness-relevance are distinct concepts (It could be highly relevant but it can't tell what it should be).
We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.
In a nutshell, it follows the same license as BFL Flux-dev model.
"Octopus DJ spinning the turntables at a rave."
The human like hands the DJ sprouts are interesting, and no amount of prompting seems to stop them.
Opinionated, as the paper says.
Maybe you got a lucky roll :)
Imagine one of these: https://imgur.com/a/DiAOTzJ but with two spouts at the top dropping different colored balls
Its attempts: https://imgur.com/undefined https://imgur.com/a/uecXDzI
https://genai-showdown.specr.net
On another note, there seem to be some indication that Wan 2.2+ future models might end up becoming significant players in the T2I space though you'll probably need a metric ton of LoRAs to cover some of the lack of image diversity.
Also, FWIW, this model focus was around aesthetics rather than strict prompt adherence. Not to excuse the bad samples, but to emphasize what was one of the research goals.
It’s a thorny trade-off, but an important one if one wants to get rid of what’s sometimes known as “the flux look”.
Re: Wan 2.2 I’ve also been reading of people commenting about using Wan 2.2 for base generation and Krea for the refiner pass which I thought was interesting.
> FWIW, this model focus was around aesthetics
Agreed - whereas these tests are really focused on various GenAI image models ability to follow complicated prompts and are not as concerned with overall visual fidelity.
Regarding the "flux look" I'd be interested to see if Krea addresses both the waxy skin look AND the omnipresent shallow depth of field.
[1]: https://www.reddit.com/r/StableDiffusion/comments/1mec2dw/te...
dvrp•21h ago
I’m the Co-founder and CTO of Krea. We’re excited because we wanted to release the weights for our model and share it with the HN community for a long time.
My team and I will try to be online and try to answer any questions you may have throughout the day.
jackphilson•12h ago
dvrp•12h ago
It’s simple: hackability and recruiting!
The open-source community hacking around it and playing with it PLUS talented engineers who may be interested in working with us already makes this release worth it. A single talented distributed systems engineer has a lot of impact here.
Also, the company ethos is around AI hackability/controllability, high-bar for talent, and AI for creatives - so this aligns perfectly.
The fact that Krea serves both in-house and 3rd-Party models tells you that we are not that bullish on models being a moat.
wjrb•12h ago
cchance•12h ago
yieldcrv•11h ago
mk_stjames•12h ago