[sorry I messed up with that title! perils of not reading the articles closely]
If you show something photorealistic and AI generated, what is shown is simply an illusion.
If you use a cartoon style maybe they can work because the user will immediately understand what is shown is not a photograph.
It's extremely hard to block out a scene with just words, eg. "rotate hand 45 degrees, stand perpendicular to the column, shadows from light source 60 degrees above horizon, large box in front of chest, approximately 2 feet wide", etc.
Image-to-image, ControlNets, previz-to-final, etc. are the way to go, and I'm convinced this is the core interface for image and video creation. Text prompts will get you a coarse grained first approximation, which you then visually adjust to your exact needs with UI/UX-first models.
I built an intentional "crafting" engine so people could mold images like clay, with full intention:
https://github.com/storytold/artcraft
This is really early days though. I expect more tools and models to enable you to fully manipulate everything first-class, in 2d/3d. As if everything in an image were mutable.
As a film director, this is really exciting stuff.
kazinator•1h ago
dang•1h ago
https://news.ycombinator.com/newsguidelines.html
linkjuice4all•1h ago
properbrew•1h ago
> Take a freeform query (like ‘sfo->jfk’) and turn it into a ‘place’
> Build a database of ‘places’ -> pictures
> Build a software system that can take a ‘place’, look it up in a database and spit out the right picture – even if that ‘place’ isn’t in the database