I'm also curious how usable are simpler vision models such as Florence in case you explored this direction.
However I think the temptation to lean all tasks on AI is perhaps a little naive if not lazy.
For mask generation, there is really not much reason to use AI. In this example, simple stochastic blob detection, a trivial function you could get from openCV or ask a college sophomore to write would generate much better quality masks.
we're currently working on pipelines that limit the the involvement of AI to various tasks. for example, when generating an ad there's usually logo, some banner text, and background image.
we can use gpt-image-1 to generate the background image, another LLM to identify the coordinates of where we place the logo, and just add the logo onto the image. this is just one example!
Feel like control nets and some minimal photoshop work would've been better.
we're actually working on some form of what you described where we take images generated from LLMs + add consistent logos discretely rather than generatively.
But then, I guess it's not much different of an idea from the earlier use of GANs, or of telling LLMs to "stop hallucinating", etc.
average_r_user•9h ago
palashshah•7h ago