It was really specific to Princeton’s clusters, though, so I decided to make a generalized version for everyone to use: slurmq
Slurm's built-in fairshare only deprioritizes heavy users. Sometimes you need a hard cap. slurmq tracks GPU-hours per user over a rolling window and kills jobs when they go over quota set by the sysadmin.
Some quick commands:
$ pip install slurmq
$ slurmq check # check your quota
$ slurmq report # admin: see who's over limit
$ slurmq monitor --once --enforce # cron: warn, then cancel
Docs: https://dedalus-labs.github.io/slurmqSource: https://github.com/dedalus-labs/slurmq
Hoping this helps other HPC sysadmins. We're using it internally and would love to hear how others handle GPU quota enforcement.