The core idea: instead of letting tasks compete freely for threads, tokens act as admission control — tasks are classified by cost (CPU weight + IO pressure) and routed accordingly before they're ever queued.
It uses a decorator-based API so the coordination layer stays out of the caller's logic. The system sits on top of Python's threading module and asyncio, and targets the GIL-constrained environments directly.
This is experimental and I'm not claiming it's production-ready. I'm sharing it because the model feels like it could be useful.
Repo: https://github.com/TavariAgent/Py-TokenGate Proof of concept doc: https://github.com/TavariAgent/Py-TokenGate/blob/trunk/proof-of-concept.md