This is interesting take, because it also solves a common issue whereas all agents try to optimize for the same thing: their rewards will shift with changes to the environment, so in essence they will take different actions according to how each one perceives the environment and calculates rewards, and will thus take different actions.
zekrioca•23m ago