The state-of-the-art has been to do fine-tuning on OP-tool datasets. The problem is that very, very few datasets cover scheduling problems and a slither of those that exist don't realistically represent actual scheduling constraints.
Another method is to do multiple prompting strategies, like OptiMUS, which is the one I've been prototyping with. It's inconsistent with GPT-5, and any lesser models will fail to model the problem correctly at all and produce solvers that are infeasible (but in fact are) or just error.
I'd like to think that this is actually a solved problem, since it comes down to teaching LLM how to code, and we're not asking LLM to output schedules generatively. However, the literature is sparse on this topic and all the attention right now goes to LLMs doing the actual maths, which I think is only experimental at this time.
Operations research is one of the unclaimed corners of AI and would love any input from those who have expertise in OP and/or AI. This is something we believe will be an important aspect of how the world's industrial output will rely on, and we ought not to let this be in the backburner as many (and I mean many) people are still doing scheduling manually on Excel or even paper and post-its.
Any input appreciated.
birudeghi