FLUX.1 Kontext

412•minimaxir•19h ago

Comments

SV_BubbleTime•18h ago

Single shot LORA effect if it works as they cherry pick will be a game changer for editing.

As with almost any AI release though, unless it’s open weights, I don’t care. The strengths and weaknesses of these models are apparent when you run them locally.

ttoinou•18h ago

They’re not apparent when you run them online ?

SV_BubbleTime•6h ago

Not for what these models are actually being used for.

nullbyte•18h ago

Hopefully they list this on HuggingFace for the opensource community. It looks like a great model!

minimaxir•17h ago

The original open-source Flux releases were also on Hugging Face.

vunderba•17h ago

From their site they will be releasing the DEV version - which is a distilled variant - so quality and adherence will suffer unfortunately.

fortran77•18h ago

It still has no idea what a Model F keyboard looks like. I tried prompts and editing, and got things that weren't even close.

yorwba•18h ago

You mean when you edit a picture of a Model F keyboard and tell it to use it in a scene, it still produces a different keyboard?

refulgentis•18h ago

Interesting, would you mind sharing? (imgur allows free image uploads, quick drag and drop)

I do have a "works on my machine"* :) -- prompt "Model F keyboard", all settings disabled, on the smaller model, seems to have substantially more than no idea: https://imgur.com/a/32pV6Sp

(Google Images comparison included to show in-the-wild "Model F keyboard", which may differ from my/your expected distribution)

* my machine, being, https://playground.bfl.ai/ (I have no affiliation with BFL)

jsheard•17h ago

Your generated examples just look like generic modern-day mechanical keyboards, they don't have any of the Model Fs defining features.

AStonesThrow•17h ago

Your Google Images search indicates the original problem of models training on junk misinformation online. If AI scrapers are downloading every photo that's associated with "Model F Keyboard" like that, the models have no idea what is an IBM Model F, or its distinguishing characteristics, and what is some other company's, and what is misidentified.

https://commons.wikimedia.org/wiki/Category:IBM_Model_F_Keyb...

Specifying "IBM Model F keyboard" and placing it in quotation marks improves the search. But the front page of the search is tip-of-the-iceberg compared to whatever the model's scrapers ingested.

Eventually you may hit trademark protections. Reproducing a brand-name keyboard may be as difficult as simulating a celebrity's likeness.

I'm not even sure what my friends look like on Facebook, so it's not clear how an AI model would reproduce a brand-name keyboard design on request.

refulgentis•16h ago

I agree with you vehemently.

Another way of looking at it is, insistence on complete verisimilitude in an image generator is fundamentally in error.

I would argue, even undesirable. I don't want to live in a world where a 45 year old keyboard that was only out for 4 years is readily imitated in every microscopic detail.

I also find myself frustrated, and asking myself why.

First thought that jumps in: it's very clear that it is in error to say the model has no idea, modulo there's some independent run that's dramatically different from the only one offered in this thread.

Second thought: if we're doing "the image generators don't get details right", it would seem to be there a lot simpler examples than OPs, and it is better expressed that way - I assume it wasn't expressed that way because it sounds like dull conversation, but it doesn't have to be!

Third thought as to why I feel frustrated: I feel like I wasted time here - no other demos showing it's anywhere close to "no idea", its completely unclear to me whats distinctive about a IBM Model F Keyboard, and the the wikipedia images are worse than Google's AFAICT.

fc417fc802•11h ago

> if we're doing "the image generators don't get details right", it would seem to be there a lot simpler examples than OPs

There are different sorts of details though, and the distinctions are both useful and interesting to understanding the state of the art. If "man drinking coke" produces someone with 6 fingers holding a glass of water that's completely different from producing someone with 5 fingers holding a can of pepsi.

Notice that none of the images in your example got the function key placement correct. Clearly the model knows what a relatively modern keyboard is, and it even has some concept of a vaguely retro looking mechanical keyboard. However indeed I'm inclined to agree with OP that it has approximately zero idea what an "IBM model F" keyboard is. I'm not sure that's a failure of the model though - as you point out, it's an ancient and fairly obscure product.

fc417fc802•11h ago

> Eventually you may hit trademark protections. Reproducing a brand-name keyboard may be as difficult as simulating a celebrity's likeness.

Then the law is broken. Monetizing someone's likeness is an issue. Utilizing trademarked characteristics to promote your own product without permission is an issue. It's the downstream actions of the user that are the issue, not the ML model itself.

Models regurgitating copyrighted material verbatim is of course an entirely separate issue.

refulgentis•11h ago

>> Eventually you may hit trademark protections.

> Then the law is broken.

> Utilizing trademarked characteristics to promote your own product without permission is an issue.

It sounds like you agree with parent that if your product reproduces trademark characteristics, it is utilizing trademarked characteristics. Just about at what layer you don't have responsibility. And the layer that has responsibility is the one that profits unjustly from the AI.

I'm interested if there's an argument for saying only the 2nd party user of the 1st party AI model, selling AI model output, to a 3rd party, is intuitively unfair.

I can't think of one. e.g. Disney launches some new cartoon or whatever. 1st party Openmetagoog, trains on it to make my "Video Episode Generator" product. Now, Openmetagoogs Community Pages are full of 30m video episodes made by their image generator. They didn't make them, nor do they promote them. Inuitively, Openmetagoog a competitor for manufacturing my IP, and that is also intuitively wrong. Your analysis would have us charge the users for sharing the output.

fc417fc802•10h ago

> if your product reproduces trademark characteristics, it is utilizing trademarked characteristics.

I wouldn't agree with that, no. To my mind "utilizing" generally requires intent at least in the context we're discussing here (ie moral or legal obligations). I'd remind you that the entire point of trademark is (approximately) to prevent brand confusion within the market.

> Your analysis would have us charge the users for sharing the output.

Precisely. I see it as both a matter of intent and concrete damages. Creating something (pencil, diffusion model, camera, etc) that could possibly be used in a manner that violates the law is not a problem. It is the end user violating the law that is at fault.

Imagine an online community that uses blender to create disney knockoffs and shares them publicly. Blender is not at fault and the creation of the knockoffs themselves (ie in private) is not the issue either. It's the part where the users proceed to publicly share them that poses the problem.

> They didn't make them, nor do they promote them.

By the same logic youtube neither creates nor promotes pirated content that gets uploaded. We have DMCA takedown notices for dealing with precisely this issue.

> Inuitively, Openmetagoog a competitor for manufacturing my IP, and that is also intuitively wrong.

Let's be clear about the distinction between trademark and copyright here. Outputting a verbatim copy is indeed a problem. Outputting a likeness is not, but an end user could certainly proceed to (mis)use that output in a manner that is.

Intent matters here. A product whose primary purpose is IP infringement is entirely different from one whose purpose is general but could potentially be used to infringe.

stephen37•16h ago

I got it working when I provided an image of a Model F keyboard. This is the strength of the model, provide it an input image and it will do some magic

Disclaimer: I work for BFL

anjneymidha•18h ago

Technical report here for those curious: https://cdn.sanity.io/files/gsvmb6gz/production/880b07220899...

rvz•18h ago

Unfortunately, nobody wants to read the report, but what they are really after is to download the open-weight model.

So they can take it and run with it. (No contributing back either).

anjneymidha•18h ago

"FLUX.1 Kontext [dev]

Open-weights, distilled variant of Kontext, our most advanced generative image editing model. Coming soon" is what they say on https://bfl.ai/models/flux-kontext

sigmoid10•18h ago

Distilled is a real downer, but I guess those AI startup CEOs still gotta eat.

dragonwriter•16h ago

The open community has a done a lot with the open-weights distilled models from Black Forest Labs already, one of the more radical being Chroma: https://huggingface.co/lodestones/Chroma

refulgentis•18h ago

I agree that gooning crew drives a lot of open model downloads.

On HN, generally, people are more into technical discussion and/or productizing this stuff. Here, it seems declasse to mention the gooner angle, it's usually euphemized as intense reactions about refusing to download it involving the words "censor"

liuliu•17h ago

Seems implementation is straightforward (very similar to everyone else, HiDream-E1, ICEdit, DreamO etc.), the magic is on data curation (which details are lightly shared).

krackers•16h ago

I haven't been following image generation models closely, at a high level is this new Flux model still diffusion based, or have they moved to block autoregressive (possibly with diffusion for upscaling) similar to 4o?

liuliu•15h ago

Diffusion based. There is no point to move to auto-regressive if you are not also training a multimodality LLM, which these companies are not doing that.

amazingamazing•18h ago

Don’t understand the remove from face example. Without other pictures showing the persons face, it’s just using some stereotypical image, no?

vessenes•18h ago

Mm, depends on the underlying model and where it is in the pipeline; identity models are pretty sophisticated at interpolating faces from partial geometry.

Scaevolus•18h ago

The slideshow appears to be glitched on that first example. The input image has a snowflake covering most of her face.

sharkjacobs•17h ago

There's no "truth" it's uncovering, no real face, these are all just generated images, yes.

amazingamazing•17h ago

I get that but usually you would have two inputs, the reference “true”, and the target that it to be manipulated.

nine_k•17h ago

Not necessarily. "As you may see, this is a Chinese lady. You have seen a number of Chinese ladies in your training set. Imagine the face of this lady so that it won't contradict the fragment visible on the image with the snowflake". (Damn, it's a pseudocode prompt.)

amazingamazing•17h ago

yes, so a stereotypical image. my point is best illustrated if you look at all of the photos of the woman.

throwaway314155•15h ago

Even if you provide another image (which you totally can btw) the model is still generalizing predictions enough that you can say it's just making a strong guess about what is concealed.

I guess my main point is "this is where you draw the line? at a mostly accurate reconstruction of a partial of someone's face?" this was science fiction a few years ago. Training the model to accept two images (which it can, just not for explicit purposes of reconstructing (although it learns that too )) seems like a very task-specific, downstream way to handle this issue. This field is now about robust, general ways to emerge intelligent behavior not task specific models.

amazingamazing•13h ago

is it mostly accurate though? how would you know? suppose you had an asian woman whose face is entirely covered with snow.

sure you could tell AI to remove the snow and some face will be revealed, but who is to say it's accurate? that's why traditionally you have a reference input.

Gracana•40m ago

What's the traditional workflow? I haven't seen that done before, but it's something I'd like to try. Could supply the "wrong" reference too, to get something specific.

jorgemf•17h ago

I think they are doing that because using real images the model changes the face. So that problem is removed if the initial image doesn't show the face

ilaksh•16h ago

Look more closely at the example. Clearly there is an opportunity for inference with objects that only partially obscure.

pkkkzip•16h ago

They chosen Asian traits that Western beauty standards fetishize that in Asia wouldn't be taken serious at all.

I notice American text2image models tend to generate less attractive and more darker skinned humans where as Chinese text2image generate attractive and more light skinned humans.

I think this is another area where Chinese AI models shine.

throwaway314155•15h ago

> notice American text2image models tend to generate less attractive and more darker skinned humans where as Chinese text2image generate attractive and more light skinned humans

This seems entirely subjective to me.

viraptor•14h ago

> They chosen Asian traits that Western beauty standards fetishize that in Asia wouldn't be taken serious at all.

> where as Chinese text2image generate attractive and more light skinned humans.

Are you saying they have chosen Asian traits that Asian beauty standards fetishize that in the West wouldn't be taken seriously at all? ;) There is no ground truth here that would be more correct one way or the other.

turnsout•14h ago

Wow, that is some straight-up overt racism. You should be ashamed.

fc417fc802•12h ago

It reads as racist if you parse it as (skin tone and attractiveness) but if you instead parse it as (skin tone) and (attractiveness), ie as two entirely unrelated characteristics of the output, then it reads as nothing more than a claim about relative differences in behavior between models.

Of course, given the sensitivity of the topic it is arguably somewhat inappropriate to make such observations without sufficient effort to clarify the precise meaning.

vessenes•18h ago

Pretty good!

I like that they are testing face and scene coherence with iterated edits -- major pain point for 4o and other models.

ttoinou•18h ago

How knowledgable do you need to be to tweak and train this locally ?

I spent two days trying to train a LoRa customization on top of Flux 1 dev on Windows with my RTX 4090 but can’t make it work and I don’t know how deep into this topic and python library I need to study. Are there scripts kiddies in this game or only experts ?

minimaxir•18h ago

The open-source model is not released yet, but it definitely won't be any easier than training a LoRA on Flux 1 Dev.

ttoinou•17h ago

Damn, I’m just too lazy to learn skills that will be outdated in 6 months

Flemlo•17h ago

It's normally easy to find it ore configured through comfyui.

Sometimes behind patreon if some YouTuber

throwaway675117•17h ago

Just use https://github.com/bghira/SimpleTuner

I was able to run this script to train a Lora myself without spending any time learning the underlying python libraries.

ttoinou•17h ago

Well thank you I will test that

dagaci•14h ago

SimpleTuner is dependant on Microsoft's DeepSpeed which doesnt work on Windows :)

So you probably better off using Ai-ToolKit https://github.com/ostris/ai-toolkit

3abiton•17h ago

> I spent two days trying to train a LoRa customization on top of Flux 1 dev on Windows with my RTX 4090 but can’t make

Windows is mostly the issue, to really take advantage, you will need linux.

ttoinou•17h ago

Even using WSL2 with Ubuntu isn't good enough ?

minimaxir•17h ago

Currently am testing this out (using the Replicate endpoint: https://replicate.com/black-forest-labs/flux-kontext-pro). Replicate also hosts "apps" with examples using FLUX Kontext for some common use cases of image editing: https://replicate.com/flux-kontext-apps

It's pretty good: quality of the generated images is similar to that of GPT-4o image generation if you were using it for simple image-to-image generations. Generation is speedy at about ~4 seconds per generation.

Prompt engineering outside of the examples used on this page is a little fussy and I suspect will evolve over time. Changing styles or specific aspects does indeed work, but the more specific you get, the more it tends to ignore the specifics.

skipants•17h ago

> Generation is speedy at about ~4 seconds per generation

May I ask on which GPU & VRAM?

edit: oh unless you just meant through huggingface's UI

minimaxir•17h ago

It is through Replicate's UI listed, which goes through Black Forest Labs's infra so would likely get the same results from their API.

zamadatix•15h ago

The open weights variant is "coming soon" so the only option is hosted right now.

cuuupid•17h ago

Honestly love Replicate for always being up to date. It’s amazing that not only do we live in a time of rapid AI advancement, but that every new research grade model is immediately available via API and can be used in prod, at scale, no questions asked.

Something to be said about distributors like Replicate etc that are adding an exponent to the impact of these model releases

minimaxir•17h ago

That's less on the downstream distributors, more on the model developers themselves realizing that ease-of-accessibility of the models themselves on Day 1 is important for getting community traction. Locking the model exclusively behind their own API won't work anymore.

Llama 4 was another recent case where they explicitly worked with downstream distributors to get it working Day 1.

meowface•15h ago

I have no affiliation with either company but from using both a bunch as a customer: Replicate has a competitor at https://fal.ai/models and FAL's generation speed is consistently faster across every model I've tried. They have some sub-100 ms image gen models, too.

Replicate has a much bigger model selection. But for every model that's on both, FAL is pretty much "Replicate but faster". I believe pricing is pretty similar.

echelon•14h ago

A16Z invested in both. It's wild. They've been absolutely flooding the GenAI market for images and videos with investments.

They'll have one of the victors, whoever it is. Maybe multiple.

bfirsh•13h ago

Founder of Replicate here. We should be on par or faster for all the top models. e.g. we have the fastest FLUX[dev]: https://artificialanalysis.ai/text-to-image/model-family/flu...

If something's not as fast let me know and we can fix it. ben@replicate.com

echelon•13h ago

Hey Ben, thanks for participating in this thread. And certainly also for all you and your team have built.

Totally frank and possibly awkward question, you don't have to answer: how do you feel about a16z investing in everyone in this space?

They invested in you.

They're investing in your direct competitors (Fal, et al.)

They're picking your downmarket and upmarket (Krea, et al.)

They're picking consumer (Viggle, et al.), which could lift away the value.

They're picking the foundation models you consume. (Black Forest Labs, Hedra, et al.)

They're even picking the actual consumers themselves. (Promise, et al.)

They're doing this at Series A and beyond.

Do you think they'll try to encourage dog-fooding or consolidation?

The reason I ask is because I'm building adjacent or at a tangent to some of this, and I wonder if a16z is "all full up" or competitive within the portfolio. (If you can answer in private, my email is [my username] at gmail, and I'd be incredibly grateful to hear your thoughts.)

Beyond that, how are you feeling? This is a whirlwind of a sector to be in. There's a new model every week it seems.

Kudos on keeping up the pace! Keep at it!

a2128•17h ago

It seems more accurate than 4o image generation in terms of preserving original details. If I give it my 3D animal character and ask it for a minor change like changing the lighting, 4o will completely mangle the face of my character, it will change the body and other details slightly. This Flux model keeps the visible geometry almost perfectly the same even when asked to significantly change the pose or lighting

echelon•14h ago

gpt-image-1 (aka "4o") is still the most useful general purpose image model, but damn does this come close.

I'm deep in this space and feel really good about FLUX.1 Kontext. It fills a much-needed gap, and it makes sure that OpenAI / Google aren't the runaway victors of images and video.

Prior to gpt-image-1, the biggest problems in images were:

  - prompt adherence
  - generation quality
  - instructiveness (eg. "put the sign above the second door")
  - consistency of styles, characters, settings, etc. 
  - deliberate and exact intentional posing of characters and set pieces
  - compositing different images or layers together
  - relighting

Fine tunes, LoRAs, and IPAdapters fixed a lot of this, but they were a real pain in the ass. ControlNets solved for pose, but it was still awkward and ugly. ComfyUI was an orchestrator of this layer of hacks that kind of got the job done, but it was hacky and unmaintainable glue. It always felt like a fly-by-night solution.

OpenAI's gpt-image-1 solved all of these things with a single multimodal model. You could throw out ComfyUI and all the other pre-AI garbage and work directly with the model itself. It was magic.

Unfortunately, gpt-image-1 is ridiculously slow, insanely expensive, highly censored (you can't use a lot of copyrighted characters or celebrities, and a lot of totally SFW prompts are blocked). It can't be fine tuned, so you're suck with the "ChatGPT style" and (as is called by the community) the "piss filter" (perpetually yellowish images).

And the biggest problem with gpt-image-1 is because it puts image and text tokens in the same space to manipulate, it can't retain the exact precise pixel-precise structure of reference images. Because of that, it cannot function as an inpainting/outpainting model whatsoever. You can't use it to edit existing images if the original image mattered.

Even with those flaws, gpt-image-1 was a million times better than Flux, ComfyUI, and all the other ball of wax hacks we've built up. Given the expense of training gpt-image-1, I was worried that nobody else would be able to afford to train the competition and that OpenAI would win the space forever. We'd be left with only hyperscalers of AI building these models. And it would suck if Google and OpenAI were the only providers of tools for artists.

Black Forest Labs just proved that wrong in a big way! While this model doesn't do everything as well as gpt-image-1, it's within the same order of magnitude. And it's ridiculously fast (10x faster) and cheap (10x cheaper).

Kontext isn't as instructive as gpt-image-1. You can't give it multiple pictures and ask it to copy characters from one image into the pose of another image. You can't have it follow complex compositing requests. But it's close, and that makes it immediately useful. It fills a much-needed gap in the space.

Black Forest Labs did the right thing by developing this instead of a video model. We need much more innovation in the image model space, and we need more gaps to be filled:

  - Fast
  - Truly multimodal like gpt-image-1
  - Instructive 
  - Posing built into the model. No ControlNet hacks. 
  - References built into the model. No IPAdapter, no required character/style LoRAs, etc. 
  - Ability to address objects, characters, mannequins, etc. for deletion / insertion. 
  - Ability to pull sources from across multiple images with or without "innovation" / change to their pixels.
  - Fine-tunable (so we can get higher quality and precision)

Something like this that works in real time would literally change the game forever.

Please build it, Black Forest Labs.

All of those feature requests stated, Kontext is a great model. I'm going to be learning it over the next weeks.

Keep at it, BFL. Don't let OpenAI win. This model rocks.

Now let's hope Kling or Runway (or, better, someone who does open weights -- BFL!) develops a Veo 3 competitor.

I need my AI actors to "Meisner", and so far only Veo 3 comes close.

ttoinou•10h ago

Your comment is def why we come to HN :)

Thanks for the detailed info

tristanMatthias•9h ago

Thought the SAME thing

meta87•8h ago

this breakdown made my day thank you!

Im building a web based paint/image editor with ai inpainting etc

and this is going to be a great model to use price wise and capability wise

completely agree so happy its not any one of these big co’s controlling the whole space!

perk•7h ago

What are you building? Ping me if you want a tester of half-finished breaking stuff

whywhywhywhy•2h ago

>Given the expense of training gpt-image-1, I was worried that nobody else would be able to afford to train the competition

OpenAI models are expensive to train because it’s beneficial for OpenAI models to be expensive and there is no incentive to optimize when they’re gonna run in a server farm anyway.

Probably a bunch of teams never bothered trying to replicate Dall-E 1+2 because the training run cost millions, yet SD1.5 showed us comparable tech can run on a home computer and be trained from scratch for thousands or fine tuned for cents.

reissbaker•8h ago

In my quick experimentation for image-to-image this feels even better than GPT-4o: 4o tends to heavily weight the colors towards sepia, to the point where it's a bit of an obvious tell that the image was 4o-generated (especially with repeated edits); FLUX.1 Kontext seems to use a much wider, more colorful palette. And FLUX, at least the Max version I'm playing around with on Replicate, nails small details that 4o can miss.

I haven't played around with from-scratch generation, so I'm not sure which is best if you're trying to generate an image just from a prompt. But in terms of image-to-image via a prompt, it feels like FLUX is noticeably better.

andybak•17h ago

Nobody tested that page on mobile.

jnettome•17h ago

I'm trying to login to evaluate this but the google auth redirects me back to localhost:3000

vunderba•17h ago

I'm debating whether to add the FLUX Kontext model to my GenAI image comparison site. The Max variant of the model definitely scores higher in prompt adherence nearly doubling Flux 1.dev score but still falling short of OpenAI's gpt-image-1 which (visual fidelity aside) is sitting at the top of the leaderboard.

I liked keeping Flux 1.D around just to have a nice baseline for local GenAI capabilities.

https://genai-showdown.specr.net

Incidentally, we did add the newest release of Hunyuan's Image 2.0 model but as expected of a real-time model it scores rather poorly.

EDIT: In fairness to Black Forest Labs this model definitely seems to be more focused on editing capabilities to refine and iterate on existing images rather than on strict text-to-image creation.

Klaus23•16h ago

Nice site! I have a suggestion for a prompt that I could never get to work properly. It's been a while since I tried it, and the models have probably improved enough that it should be possible now.

  A knight with a sword in hand stands with his back to us, facing down an army. He holds his shield above his head to protect himself from the rain of arrows shot by archers visible in the rear.

I was surprised at how badly the models performed. It's a fairly iconic scene, and there's more than enough training data.

lawik•7h ago

Making an accurate flail (stick - chain - ball) is a fun sport.. weird things tend to happen.

meta87•8h ago

please add! cool site thanks :)

nopinsight•5h ago

Wondering if you could add “Flux 1.1 Pro Ultra” to the site? It’s supposed to be the best among the Flux family of models, and far better than Flux Dev (3rd among your current candidates) at prompt adherence.

Adding it would also provide a fair assessment for a leading open source model.

The site is a great idea and features very interesting prompts. :)

theyinwhy•2h ago

Looks good! Would be great to see Adobe Firefly in your evaluation as well.

vunderba•17h ago

Some of these samples are rather cherry picked. Has anyone actually tried the professional headshot app of the "Kontext Apps"?

https://replicate.com/flux-kontext-apps

I've thrown half a dozen pictures of myself at it and it just completely replaced me with somebody else. To be fair, the final headshot does look very professional.

minimaxir•17h ago

Is the input image aspect ratio the same as the output aspect ratio? In some testing I've noticed that there is weirdness that happens if there is a forced shift.

doctorpangloss•14h ago

Nobody has solved the scientific problem of identity preservation for faces in one shot. Nobody has even solved hands.

emmelaich•13h ago

I tried making a realistic image from a cartoon character but aged. It did very well, definitely recognisable as the same 'person'.

danielbln•6h ago

Best bet right now is still face swapping with something like insightface.

mac-mc•13h ago

I tried a professional headshot prompt on the flux playground with a tired gym selfie and it kept it as myself, same expression, sweat, skin tone and all. It was like a background swap, then I expanded it to "make a professional headshot version of this image that would be good for social media, make the person smile, have a good pose and clothing, clean non-sweaty skin, etc" and it stayed pretty similar, except it swapped the clothing and gave me an awkward smile, which may be accurate for those kinds of things if you think about it.

layer8•16h ago

> show me a closeup of…

Investigators will love this for “enhance”. ;)

mdp2021•4h ago

At some point, "Do not let the tool invent details!" will become a shout more frequent than most expressions.

ilaksh•16h ago

Anyone have a guess as to when the open dev version gets released? More like a week or a month or two I wonder.

mdp2021•15h ago

Is input restricted to a single image? If you could use more images as input, you could do prompts like "Place the item in image A inside image B" (e.g. "put the character of image A in the scenery of image B"), etc.

carlosdp•15h ago

There's an experimental "multi" mode you can input multiple images to

echelon•14h ago

Fal has the multi image interface to test against. (Replicate might as well, I haven't checked yet.)

THIS MODEL ROCKS!

It's no gpt-image-1, but it's ridiculously close.

There isn't going to be a moat in images or video. I was so worried Google and OpenAI would win creative forever. Not so. Anyone can build these.

bossyTeacher•14h ago

I wonder if this is using a foundation model or a fine tuned one

fagerhult•14h ago

I vibed up a little chat interface https://kontext-chat.vercel.app/

Hard_Space•5h ago

Does not seem to want to deal with faces at all, flags everything with humans as 'sensitive' and declines.

eamag•14h ago

Can it generate chess? https://manifold.markets/Hazel/an-ai-model-will-successfully...

zamadatix•3h ago

The focus of this model is to be able to do iterative editing and/or use other images as a source while the focus of that bet is to consistently one shot a specific image 9/10 times with the same prompt. Given the canyon between those two focuses I don't think so, but maybe if you had an inventive enough prompt?

gravitywp•12h ago

You can try it now on https://fluxk.art.

xnorswap•5h ago

I tried this out and a hilarious "context-slip" happened:

https://imgur.com/a/gT6iuV1

It generated (via a prompt) an image of a space ship landing on a remote planet.

I asked an edit, "The ship itself should be more colourful and a larger part of the image".

And it replaced the space-ship with a container vessel.

It had the chat history, it should have understood I still wanted a space-ship, but it dropped the relevant context for what I was trying to achieve.

gunalx•1h ago

I mean ro its credit, one of the cobtainer ships seems to be flying. /s

Online Historical Encyclopaedia of Programming Languages

TIO is a family of online interpreters

Quimera: LLMs based tool to discover smart contract exploits

OpenAI Can Stop Pretending

Data watchdog put cops on naughty step for lost CCTV footage

On The Campaign Trail, Elon Musk Juggled Drugs and Family Drama

Concatenative Language

Surface-Stable Fractal Dithering

Types and other techniques as an accessibility tool for the ADHD brain [video]

Coexisting with digital Entities: a new Perspective

For Some Recent Graduates, the A.I. Job Apocalypse May Be Here

Show HN: Git-Add–Interactive with Enhancements

Pmccabe – McCabe-style complexity and line counting for C and C++

Thinking of Leaving the Industry (2008)

Using Rust Back End to Serve an SPA

Simons Collaboration on the Global Brain

Radeon Software for Linux Dropping AMD's Proprietary OpenGL/Vulkan Drivers

Researchers Are Slowly Finding Ways to Stem the Tide of PFAS Contamination

Functional Programming in Bash: Harnessing the Power of Simplicity

I Want to Love Linux. It Doesn't Love Me Back: Post 3 - Console Access

Behavioral responses of domestic cats to human odor

Four Things I've Changed My Mind About in Engineering

Systems Correctness Practices at Amazon Web Services

Pirates of the Ayahuasca

'Golden Dome' plan has a major obstacle: Physics

How America became meat stick nation

Behind the Curtain: The Great Fusing

What will it feel like living a world where AI is smarter than all of us?

Komodo: A tool to build and deploy software across many servers

Ugly Gerry

Online Historical Encyclopaedia of Programming Languages

TIO is a family of online interpreters

Quimera: LLMs based tool to discover smart contract exploits

OpenAI Can Stop Pretending

Data watchdog put cops on naughty step for lost CCTV footage

On The Campaign Trail, Elon Musk Juggled Drugs and Family Drama

Concatenative Language

Surface-Stable Fractal Dithering

Types and other techniques as an accessibility tool for the ADHD brain [video]

Coexisting with digital Entities: a new Perspective

For Some Recent Graduates, the A.I. Job Apocalypse May Be Here

Show HN: Git-Add–Interactive with Enhancements

Pmccabe – McCabe-style complexity and line counting for C and C++

Thinking of Leaving the Industry (2008)

Using Rust Back End to Serve an SPA

Simons Collaboration on the Global Brain

Radeon Software for Linux Dropping AMD's Proprietary OpenGL/Vulkan Drivers

Researchers Are Slowly Finding Ways to Stem the Tide of PFAS Contamination

Functional Programming in Bash: Harnessing the Power of Simplicity

I Want to Love Linux. It Doesn't Love Me Back: Post 3 - Console Access

Behavioral responses of domestic cats to human odor

Four Things I've Changed My Mind About in Engineering

Systems Correctness Practices at Amazon Web Services

Pirates of the Ayahuasca

'Golden Dome' plan has a major obstacle: Physics

How America became meat stick nation

Behind the Curtain: The Great Fusing

What will it feel like living a world where AI is smarter than all of us?

Komodo: A tool to build and deploy software across many servers

Ugly Gerry

FLUX.1 Kontext

Comments